Re: OSPF design [7:40269]

2002-04-07 Thread [EMAIL PROTECTED]

Aha!  I have found the answer to my own question about why the summary 
from R2 isn't used even if there is no null0 route. 
Answer - because the RFC says it shouldn't be ;-)
Specifically, RFC2328, 16.2(3), when considering summary LSAs to build the 
routing table...
"If it is a Type 3 summary-LSA, and the collection of destinations 
described by the summary-LSA equals one of the router's configured area 
address ranges (see Section 3.5), and the particular area address range is 
active, then the summary-LSA should be ignored.  "Active" means that there 
are one or more reachable (by intra-area paths) networks contained in the 
area range."

When all else fails, RTFM (or RTFRFC in this case).  All those people who 
knew this all along and were wondering what on earth I was confused about 
can now shake their heads and wonder at my ignorance  ;-)  This has been a 
most enlightening thread.

JMcL

- Forwarded by Jenny Mcleod/NSO/CSDA on 08/04/2002 03:29 pm -


"[EMAIL PROTECTED]" Adding a point to point link between ABR's
would enhance the resiliency
>between the two and tend to protect against Area partitioning. Depending
>on the capabilities of the backbone routers, letting more specifics into
>the backbone might be helpful as well as it would deliver more optimal
>routing and also help solve this problem.
>
>Shorter answer is, ya, thats a good idea in my opinion :)
>
>Pete
>
>
[snipped]




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=40790&t=40269
--
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]



Re: OSPF design [7:40269]

2002-04-07 Thread [EMAIL PROTECTED]

Peter,
comments inline...

JMcL
- Forwarded by Jenny Mcleod/NSO/CSDA on 08/04/2002 09:14 am -


"Peter van Oene" 
Sent by: [EMAIL PROTECTED]
06/04/2002 01:17 am
Please respond to "Peter van Oene"

 
To: [EMAIL PROTECTED]
cc: 
    Subject:        Re: OSPF design [7:40269]


New theory!  might work :)

My assumptions:

1) R1 and R2 are your ABR's, R2's link into the backbone is a dial on 
demand link only used when R1's link fails.
JMcL: Correct /JMcL

2) Due to the above, the primary problem is that when the non-backbone 
area 
becomes partitioned, R1 will not be able to deliver to certain nets south 
of R2 as it does not see R2 as a valid hop toward those nets (since it 
doesn't see the type1/2 advertisements from that area).  In this case, R1 
either forwards via default toward the core and loops traffic for those 
unreachable nets, or matches a null0 route for the summary and discards.
JMcL: R1 does not see the type 1/2 advertisements, but it DOES see the 
type 3 advertisements for the summary from R2 via the area 0 connection. 
IF there is no null0 route generated by R1 (either because generation is 
turned off or because the IOS version is such that the default is not to 
generate one), I would have expected this summary to be used to direct 
traffic for unreachable nets to R2.  According to the support guys, this 
is not the case as the problem happened even before the null0 route was 
generated.  /JMcL

3) R2 will have this problem only when R1 loses connectivity to the core 
_and_ the non backbone area becomes partitioned.  Hence, fixing this 
problem is less important that fixing #2. 
JMcL: Yes, if you mean that nets south of R1 (I like the phrasing - very 
descriptive) will be "lost" if R1 loses connectivity to the core and the 
non-backbone area is partitioned.  /JMcL 

Solution:

Disable the creation of a null0 route for the aggregate on R1 and instead 
add a static route for the aggregate on R1 toward R2.
JMcL: I suspect this would work, but I am still confused as to why the 
type 3 advertisement from R2 does not provide reachability if there is no 
null0 route.  As far as I can see, the static shouldn't be necessary (but 
probably is). 
As a side note, has anyone actually used the "no discard-route" command? 
That's the only way I've spotted to turn off the creation of the null0 
route, and it's basically undocumented - does it work as expected?  /JMcL 

With this config, if the area becomes partitioned, while R1's ethernet 
toward the core is live, when R1 pulls traffic based on the summary toward 

unreachable nets behind R2, this route will push the traffic toward 
R2.  Should R2 not be able to reach those nets, the can be safely 
considered unreadable and R2's null0 route will discard the traffic 
thereby 
eliminating loops.  The only downside is that some truly unreachable 
traffic might transit the R1-R2 link before being eliminated.

This will not help the situation where the area is partitioned and R1 
loses 
core connectivity, but this is a much less likely occurrence.  Plus, in 
this case your dialup link might be strained anyway so dropping a bunch of 

traffic might be helpful :)
JMcL: It's a big dialup link :-)  But you're right, that situation is much 
less likely.  /JMcL

In summary, assume 192.168/16 is the summary

R1
ip route 192.168.0.0 255.255.0.0 R2

R2
ip route 192.168.0.0 255.255.0.0 null0

Adding the cable is also helpful, but costs money and requires you to 
touch 
a bunch of routers.

At 09:04 AM 4/5/2002 -0500, Peter van Oene wrote:
>Adding a point to point link between ABR's would enhance the resiliency
>between the two and tend to protect against Area partitioning. Depending
>on the capabilities of the backbone routers, letting more specifics into
>the backbone might be helpful as well as it would deliver more optimal
>routing and also help solve this problem.
>
>Shorter answer is, ya, thats a good idea in my opinion :)
>
>Pete
>
>
[snipped]




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=40774&t=40269
--
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]



Re: OSPF design [7:40269]

2002-04-05 Thread Peter van Oene

New theory!  might work :)

My assumptions:

1) R1 and R2 are your ABR's, R2's link into the backbone is a dial on 
demand link only used when R1's link fails.

2) Due to the above, the primary problem is that when the non-backbone area 
becomes partitioned, R1 will not be able to deliver to certain nets south 
of R2 as it does not see R2 as a valid hop toward those nets (since it 
doesn't see the type1/2 advertisements from that area).  In this case, R1 
either forwards via default toward the core and loops traffic for those 
unreachable nets, or matches a null0 route for the summary and discards.

3) R2 will have this problem only when R1 loses connectivity to the core 
_and_ the non backbone area becomes partitioned.  Hence, fixing this 
problem is less important that fixing #2.

Solution:

Disable the creation of a null0 route for the aggregate on R1 and instead 
add a static route for the aggregate on R1 toward R2.

With this config, if the area becomes partitioned, while R1's ethernet 
toward the core is live, when R1 pulls traffic based on the summary toward 
unreachable nets behind R2, this route will push the traffic toward 
R2.  Should R2 not be able to reach those nets, the can be safely 
considered unreadable and R2's null0 route will discard the traffic thereby 
eliminating loops.  The only downside is that some truly unreachable 
traffic might transit the R1-R2 link before being eliminated.

This will not help the situation where the area is partitioned and R1 loses 
core connectivity, but this is a much less likely occurrence.  Plus, in 
this case your dialup link might be strained anyway so dropping a bunch of 
traffic might be helpful :)

In summary, assume 192.168/16 is the summary

R1
ip route 192.168.0.0 255.255.0.0 R2

R2
ip route 192.168.0.0 255.255.0.0 null0

Adding the cable is also helpful, but costs money and requires you to touch 
a bunch of routers.

At 09:04 AM 4/5/2002 -0500, Peter van Oene wrote:
>Adding a point to point link between ABR's would enhance the resiliency
>between the two and tend to protect against Area partitioning.   Depending
>on the capabilities of the backbone routers, letting more specifics into
>the backbone might be helpful as well as it would deliver more optimal
>routing and also help solve this problem.
>
>Shorter answer is, ya, thats a good idea in my opinion :)
>
>Pete
>
>
>At 01:39 PM 4/4/2002 -0500, you wrote:
> >At 11:59 AM 4/4/02, Chuck wrote:
> > >that was going to be my guess as well. I've done a number of lab
>experiments
> > >with similar themes, and have in my own mind at least, confirmed what is
> > >stated in the RFC - that the only serious routing issue with partitioned
> > >non-backbone areas results from overlapping
> >
> >She does seem to have overlapping summarization, if that makes sense. She
> >said:
> >
> >The area range statements on Rtr2 are...
> >[various area 0 range statements snipped]
> >   area 2.1.0.0 range 2.0.0.0 255.128.0.0
> >   area 2.2.0.0 range 2.128.0.0 255.224.0.0
> >
> >On Rtr1 the statements are...
> >[same area 0 range statements snipped]
> >   area 2.1.0.0 range 2.0.0.0 255.128.0.0
> >
> >If you look at her ASCII art e-mail, you'll see that the WAN links were
not
> >assigned contiguously unless I'm missing something. Rt1 has 2.101.0.0/16
> >and 2.109.0.0/16. Rtr 2 has 2.120.0.0/16, 2.104.0.0/16, and 2.130.0.0/16
> >
> >It's probably too late now, but perhaps if all the WAN links connected to
> >Rtr 1 had been summarizable into a group that was distinct from the WAN
> >links connected to Rtr 2, she wouldn't have the problem?? (Of course, she
> >has that area 2.2.0.0 to deal with too, but perhaps it could be something
> >different entirely)
> >
> >But I don't think she's looking for a redesign. She's looking for a quick
> >fix for now. What did you guys think of the idea of adding another direct
> >connection between the two switches and putting it in area 2.1.0.0?
> >
> >Priscilla
> >
> >
> > >Chuck
> > >
> > >""Peter van Oene""  wrote in message
> > >[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > > > HI Jenny,
> > > >
> > > > Is it safe to say that your problem is that when your non backbone
area
> > > > becomes partitioned, you lose reachability to one side of the
> > > > partition?  When you use large summarizes to describe entire areas
and
> > >have
> > > > multiple entry points into those areas themselves, this is a normal
> > > > occurrence.  If this is the problem, the solution likely involves the
>use
> > > > of less specific summaries per ABR, and/or greater L2 resiliency to
> > >protect
> > > > against partitions.  If that's not the problem, can you indicate
where
> > >I've
> > > > misread the problem description?
> > > >
> > > > Thanks
> > > >
> > > > Pete
> > > >
> > > >
> > > >
> > > > At 09:05 PM 4/2/2002 -0500, [EMAIL PROTECTED] wrote:
> > > > >Hi all,
> > > > >
> > > > >This is actually a real-life scenario, but I think it throws up some
> > > > >interesting points about OSPF that some pe

Re: OSPF design [7:40269]

2002-04-05 Thread Peter van Oene

One quick point below.  Trimmed rest.

 Question from Jenny

> >One thing I'm not clear on, though, is why the problem (reportedly)
> >happened before we upgraded to IOS 12.1 - so before a route to null0 was
> >used for the summarised networks (we didn't add one manually).  Any
>ideas?
> >  I can understand why it's happening now, so this is more for my
>curiosity
> >and understanding.

Correct me if I'm wrong, the post 12.1, IOS adds the null on ABR's when 
area-ranges are used?

In any event, adding a null route for a summary address is usually a good 
thing.   Although these null routes do nothing to enhance reachability, 
they do prevent traffic from looping when reachability is lost.  In your 
case, if your non backbone area was partitioned and traffic arrived at the 
ABR which had no specific routes for the destination in question, this 
traffic would be forwarded toward default (or another less specific 
summary) assuming the null route didn't exist.  Hence, for 192.168.1.1 as 
an example, your core routers might like the 192.168/16 route from ABR1 who 
might like your core routers 0.0.0.0/0 in which case you'll generate lots 
of useless forwarding.  Adding the null route here would simply discard the 
traffic gracefully.

Interestingly, if your ABR1 happened to default toward ABR2 for some 
reason, not having the null route would actually enable you to route around 
the problem.  However, this would only help half the area and implementing 
a default on ABR2 toward ABR1 at the same time to fix the other half would 
be a case of not looking at the whole picture :)  Some topologies might 
actually benefit from a design of this nature which might explain why the 
null route wasn't automatically added before.  However, more topologies 
would benefit from having it which likely explains why Cisco changed their 
default behavior.




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=40593&t=40269
--
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]



Re: OSPF design [7:40269]

2002-04-05 Thread Peter van Oene

Please pardon the snipping (and top posting for that matter)  Posted some 
notes inline.


 >Peter, when you say that the solution could involve "less specific
> >summaries" - do you really mean more specific summaries?  Summarising less
> >drastically (e.g. summarising each site separately) isn't a good solution
> >in this particular case because it creates too much load in the core -
> >that's how we used to do it but it created other problems.

Yes.  Thanks for catching one of my ever more frequent brain farts :)  I 
definitely meant to suggest that using more specific summaries on the ABR's 
would help.  Possibly pinning up major aggregates to null0 for the entire 
area and leaking appropriate specifics per ABR might help.   However, one 
would have to consider the impact on the core of both the additional type 
3's and the additional processing required to track their state (and their 
stability etc)



>As you should be able to see, each of these can be valid assumptions
>depending on your network objectives.  Peter, how does JunOS deal
>with this situation?

JunOS behaves much like Cisco in that we'll advertise the summary so long 
as we match a contributing specific.  There is currently no additional 
"conditional" type capabilities available.  However, given the service 
provider focus in JunOS, I tend to think that there hasn't been that much 
pressure for type 3 handling enhancements.  In these networks, OSPF 
provides reachability toward loopbacks for IBGP peering and more 
importantly, BGP next-hop resolution where path accuracy is pretty 
important.  Sub-optimal routing for transit traffic burns money :) Further, 
LSDB's are generally kept as small as possible (no type 5's for example) 
which minimizes the need for summarization from a router processing 
perspective.  If folks summarize at all, it's only for link addresses in a
pop.

I actually prefer ISIS for use in networks of this nature as the 
distribution of reachability information between levels of the hierarchy 
tends to be less restrictive in most implementations.  In JunOS (and IOS to 
some extent), one can use policies (route-maps in IOS) to govern the flow 
of information between areas instead of having to try and manipulate a 
summarization knob.  In this case, one can leak prefixes without worrying 
about what summary range they fall into.  Further, one can advertise 
aggregates and leak various specifics at the same time which can also be 
helpful in some cases.

>What would be really nice is if Cisco extended BGP conditional
>advertisement to IGPs, and introduced a knob to have the default
>behavior overridden by conditional.
>
> >I think in this case I'll be going for the "protect against partitioning"
> >solution and bung in another cable.

Wanted to voice my admiration for your verb selection here :)  Bung is 
definitely a cool way to describe a number of solutions I've seen in the 
past.  This one being far less bunged up than others I should add.




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=40591&t=40269
--
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]



Re: OSPF design [7:40269]

2002-04-05 Thread Howard C. Berkowitz

>Comments below...
>
>Thanks,
>JMcL
>- Forwarded by Jenny Mcleod/NSO/CSDA on 05/04/2002 03:25 pm -
>
>
>"Howard C. Berkowitz"
>Sent by: [EMAIL PROTECTED]
>05/04/2002 02:09 pm
>Please respond to "Howard C. Berkowitz"
>
>
> To:     [EMAIL PROTECTED]
> cc:
> Subject:Re: OSPF design [7:40269]
>
>
>Jenny,
>
>First, I apologize for not giving more of a response earlier, but
>it's been a crazy few days...three people in my office, including
>myself, have had close relatives/friends in surgery and there have
>been a lot of distractions.
>JMcL: Err.. yes, I can see how that would be distracting.  Thanks for
>taking the time for this. /JMcL
>
>I'm going to post and elaborate a bit on some observations I sent to
>you earlier, but I'm interested in why and how you have so much core
>trouble.  Could you give us an idea of the number of routes and of
>routers, and the stability of both, in the non-backbone areas?  Are
>the ABRs and any pure backbone routers doing any other
>processor-intensive tasks?
>JMcL: The non-backbone areas (about twenty of them) vary quite a bit in
>size as they map (or did once) to geographic/administrative regions.  As
>they consist of multiple geographically-dispersed small offices with two
>routers each (for redundancy), they are pretty router-rich - the smallest
>area has 20 routers and 21 networks, the largest (I think) has 52/49 in
>area x.1.0.0 and 29/27 in x.2.0.0.
>While they aren't too bad for stability, the sheer number of sites means
>that something is usually playing up somewhere :-(

Those numbers don't sound too bad. But I think the villains are below.

>The ABRs mentioned in the problem below aren't doing anything very
>exciting, but some of the core routers have a fair load.  There are
>currently 50 routers in the backbone area - the backbone area is spread
>across two data centres and the ABRs mentioned (which are in sites around
>the country - they have WAN connections to the data centres, not LAN).

First, while I know of backbone areas that do have hundreds of 
routers (Pat Murphy at the US Geological Survey--but he's also an 
OSPF protocol developer), generally it's a bad idea.  The larger 
cores that I've built recently had certainly no more than 20-32 
routers.

Given you've got two data centres (see, I can spell in Oz), a natural 
split would be to center one area 0.0.0.0 on each data center, and 
have local areas (i.e., nonzero) even at the physical data centre. 
Why should such things as server-to-server backup, etc., be 
traversing the core?
Without knowing your Internet connectivity requirements, you could 
link the backbone areas (i.e., two OSPF domains) with multiple static 
routes (adding floating for backup).


>Core routers in the data centres also support CIP cards, may be ABRs for
>other areas (we're not very good at "pure" backbone routers ;-), and until
>recently terminated stacks of DLSw circuits.

This is bad news.  And remember, in OSPF (as opposed to ISIS), the 
_router_ is not in any specific area. It is the _interfaces_ that are 
in an area.

If, hypothetically, you were to create a local area in the data 
centre for the IBM machines, all it would take is changing the 
network statements for the interfaces going to that area.

Incidentally, there's a sneaky cost saving you can use for CIP cards. 
7000 series routers support them, but don't have very fast CPUs.  But 
you don't need a fast CPU to support the CIP itself, because it has 
its own fast CPU.  You do need substantial CPU power for terminating 
the IBM tunnels.

A trick I used a good deal (and by the equipment types, you'll see 
this is fairly old), is to put the CIP into a 3-slot 7010, or two if 
I needed redundancy.  I then ran the fastest available medium -- 
mostly FDDI at the time -- back to 4500/4700 series routers, which 
were the first RISC processor routes. They terminated RSRB, did IBM 
conversions, and all the other things that were processor intensive. 
Given that there was a shared medium, I could use multiple 4x00s if 
necessary.

>We also have adjusted the OSPF timers throughout the network to make them
>more sensitive - this because we had SNA traffic (first via RSRB, then
>DLSw) and we wanted fast failover.

DLSW doesn't have the local acknowledgement problem of RSRB. You may 
be able to start returning the timers to the normal values.

>This worked, but does make OSPF a bit
>more inclined to hysteria when there are links flapping.  This is now
>being phased out as we have moved to TN3270, but the timers haven't all
>been changed back yet.
>We possibly could go back to advertising each site separately now, since
>we've reduced the load in the co

Re: OSPF design [7:40269]

2002-04-05 Thread Peter van Oene

Adding a point to point link between ABR's would enhance the resiliency 
between the two and tend to protect against Area partitioning.   Depending 
on the capabilities of the backbone routers, letting more specifics into 
the backbone might be helpful as well as it would deliver more optimal 
routing and also help solve this problem.

Shorter answer is, ya, thats a good idea in my opinion :)

Pete


At 01:39 PM 4/4/2002 -0500, you wrote:
>At 11:59 AM 4/4/02, Chuck wrote:
> >that was going to be my guess as well. I've done a number of lab
experiments
> >with similar themes, and have in my own mind at least, confirmed what is
> >stated in the RFC - that the only serious routing issue with partitioned
> >non-backbone areas results from overlapping
>
>She does seem to have overlapping summarization, if that makes sense. She
>said:
>
>The area range statements on Rtr2 are...
>[various area 0 range statements snipped]
>   area 2.1.0.0 range 2.0.0.0 255.128.0.0
>   area 2.2.0.0 range 2.128.0.0 255.224.0.0
>
>On Rtr1 the statements are...
>[same area 0 range statements snipped]
>   area 2.1.0.0 range 2.0.0.0 255.128.0.0
>
>If you look at her ASCII art e-mail, you'll see that the WAN links were not
>assigned contiguously unless I'm missing something. Rt1 has 2.101.0.0/16
>and 2.109.0.0/16. Rtr 2 has 2.120.0.0/16, 2.104.0.0/16, and 2.130.0.0/16
>
>It's probably too late now, but perhaps if all the WAN links connected to
>Rtr 1 had been summarizable into a group that was distinct from the WAN
>links connected to Rtr 2, she wouldn't have the problem?? (Of course, she
>has that area 2.2.0.0 to deal with too, but perhaps it could be something
>different entirely)
>
>But I don't think she's looking for a redesign. She's looking for a quick
>fix for now. What did you guys think of the idea of adding another direct
>connection between the two switches and putting it in area 2.1.0.0?
>
>Priscilla
>
>
> >Chuck
> >
> >""Peter van Oene""  wrote in message
> >[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > > HI Jenny,
> > >
> > > Is it safe to say that your problem is that when your non backbone area
> > > becomes partitioned, you lose reachability to one side of the
> > > partition?  When you use large summarizes to describe entire areas and
> >have
> > > multiple entry points into those areas themselves, this is a normal
> > > occurrence.  If this is the problem, the solution likely involves the
use
> > > of less specific summaries per ABR, and/or greater L2 resiliency to
> >protect
> > > against partitions.  If that's not the problem, can you indicate where
> >I've
> > > misread the problem description?
> > >
> > > Thanks
> > >
> > > Pete
> > >
> > >
> > >
> > > At 09:05 PM 4/2/2002 -0500, [EMAIL PROTECTED] wrote:
> > > >Hi all,
> > > >
> > > >This is actually a real-life scenario, but I think it throws up some
> > > >interesting points about OSPF that some people may not have come
across.
> > > >And it has a couple of bits that I don't understand.  Please excuse
the
> > > >verbosity.
> > > >
> > > >Currently, (part of) this particular network is as described below. 
It
> > > >normally works fine, but during certain types of failures,
connectivity
> > > >breaks although there is still a physical path.  I am contemplating
what
> > > >the best way to fix it would be, and would be interested in comments.
> > > >
> > > >Set-up - I don't think my ascii art is up to this but I'll give it a
go
> >if
> > > >the description isn't clear enough:
> > > >
> > > >Two ABRs (Rtr1 and Rtr2), running IOS 12.1, connected to each other
by a
> > > >direct ethernet cable in area 0, and also by several local ethernet
> > > >networks in area 2.1.0.0.  The details of the local ethernets can
> >probably
> > > >remain a fluffy cloud, but note that failure of a single component can
> > > >potentially cause all area 2.1.0.0 neighbour connectivity between Rtr1
> >and
> > > >Rtr2 to be lost, although the local ethernets may remain up on one or
> >both
> > > >routers.
> > > >
> > > >Both routers have a connection back to the core of the network (on
Rtr2
> >it
> > > >is dialup, so not usually active), which is in area 0.  Both routers
>have
> > > >WAN links to several sites (not dual-homed - each site has a link to
>only
> > > >one ABR), in area 2.1.0.0.  Rtr2 may also have WAN links to several
>sites
> > > >in area 2.2.0.0, but that's probably not too relevant.
> > > >
> > > >Both ABRs summarise the networks in area 2.1.0.0 to a single summary
> > > >network (Rtr2 summarises the networks in 2.2.0.0, if any, to another
> > > >summary network).
> > > >
> > > >This usually works fine - traffic from the core to sites connected to
> >Rtr2
> > > >(in area 2.1.0.0) travels from Rtr1 to Rtr2 across the local ethernets
> > > >(area 2.1.0.0), and in reverse from Rtr2 to Rtr1 across the Area 0
> > > >ethernet.  This, while perhaps not ideal, is as expected, and works
well
> > > >under normal circumstances.  (If you're not sure why this is expected,
> > > >read

Re: OSPF design [7:40269]

2002-04-04 Thread [EMAIL PROTECTED]

Comments below...

Thanks,
JMcL
- Forwarded by Jenny Mcleod/NSO/CSDA on 05/04/2002 03:25 pm -


"Howard C. Berkowitz" 
Sent by: [EMAIL PROTECTED]
05/04/2002 02:09 pm
Please respond to "Howard C. Berkowitz"

 
To: [EMAIL PROTECTED]
cc: 
    Subject:        Re: OSPF design [7:40269]


Jenny,

First, I apologize for not giving more of a response earlier, but 
it's been a crazy few days...three people in my office, including 
myself, have had close relatives/friends in surgery and there have 
been a lot of distractions.
JMcL: Err.. yes, I can see how that would be distracting.  Thanks for 
taking the time for this. /JMcL

I'm going to post and elaborate a bit on some observations I sent to 
you earlier, but I'm interested in why and how you have so much core 
trouble.  Could you give us an idea of the number of routes and of 
routers, and the stability of both, in the non-backbone areas?  Are 
the ABRs and any pure backbone routers doing any other 
processor-intensive tasks?
JMcL: The non-backbone areas (about twenty of them) vary quite a bit in 
size as they map (or did once) to geographic/administrative regions.  As 
they consist of multiple geographically-dispersed small offices with two 
routers each (for redundancy), they are pretty router-rich - the smallest 
area has 20 routers and 21 networks, the largest (I think) has 52/49 in 
area x.1.0.0 and 29/27 in x.2.0.0. 
While they aren't too bad for stability, the sheer number of sites means 
that something is usually playing up somewhere :-(
The ABRs mentioned in the problem below aren't doing anything very 
exciting, but some of the core routers have a fair load.  There are 
currently 50 routers in the backbone area - the backbone area is spread 
across two data centres and the ABRs mentioned (which are in sites around 
the country - they have WAN connections to the data centres, not LAN). 
Core routers in the data centres also support CIP cards, may be ABRs for 
other areas (we're not very good at "pure" backbone routers ;-), and until 
recently terminated stacks of DLSw circuits. 
We also have adjusted the OSPF timers throughout the network to make them 
more sensitive - this because we had SNA traffic (first via RSRB, then 
DLSw) and we wanted fast failover.  This worked, but does make OSPF a bit 
more inclined to hysteria when there are links flapping.  This is now 
being phased out as we have moved to TN3270, but the timers haven't all 
been changed back yet.
We possibly could go back to advertising each site separately now, since 
we've reduced the load in the core by various other methods, but I 
wouldn't want to battle the layer 8 issues to do it.
/JMcL

There can be creative solutions if you think outside the traditional 
OSPF box. Hypothetically, if your address plan split geographically, 
it might even be an idea to have an eastern and western OSPF domain 
(i.e., an area 0.0.0.0 and a set of nonzero areas), linked by 
redundant static routes or possibly BGP.  The latter is especially 
useful if you have multiple ISP connections.  Remember also that a 
router can have multiple OSPF processes, so the same router could 
participate in different domains. I assume your user population 
stretches across at least three time zones, so this sort of redesign 
might localize some core thrashing.
JMcL: I don't think we have too much time-based thrashing - even though 
most of our population is in the same time zone (especially in winter). 
Splitting our core is something that has frequently been considered, and 
in fact it was originally split, with very ugly redistribution using IGRP, 
which caused more problems than it solved.  As I mentioned, we may be 
doing a major redesign in the medium term and this is something we can 
consider again.
/JMcL 

>Peter's summarisation of the problem (pardon the pun) is a very good one 
-
>and very useful, as I hadn't really considered the broader case of
>overlapping summarisation in general.

I have a question for Peter a little later -- it can be interesting 
to contrast how different routing software deals with an implementer 
choice, and I don't know how JunOS deals with a particular situation.

>The chance of a major redesign
>simply to fix this problem is approximately the same chance as me winning
>an Olympic medal, but we may well be doing a major redesign/readdressing
>"soon" anyway, so that is something I can add to the list of
>considerations - then it may be quite feasible to put all the sites on
>Rtr1 into area 2.1.0.0 and all the sites on Rtr2 into area 2.2.0.0.  Any
>thoughts on where the local ethernets should go if we did that?  I guess
>whatever area they go in would have to be defined on both routers, and
>that might bring up issues of where we summarise again.  Hmm.  I'll have
to think about that.

>
>One thing I&#x

Re: OSPF design [7:40269]

2002-04-04 Thread Howard C. Berkowitz

Jenny,

First, I apologize for not giving more of a response earlier, but 
it's been a crazy few days...three people in my office, including 
myself, have had close relatives/friends in surgery and there have 
been a lot of distractions.

I'm going to post and elaborate a bit on some observations I sent to 
you earlier, but I'm interested in why and how you have so much core 
trouble.  Could you give us an idea of the number of routes and of 
routers, and the stability of both, in the non-backbone areas?  Are 
the ABRs and any pure backbone routers doing any other 
processor-intensive tasks?

There can be creative solutions if you think outside the traditional 
OSPF box. Hypothetically, if your address plan split geographically, 
it might even be an idea to have an eastern and western OSPF domain 
(i.e., an area 0.0.0.0 and a set of nonzero areas), linked by 
redundant static routes or possibly BGP.  The latter is especially 
useful if you have multiple ISP connections.  Remember also that a 
router can have multiple OSPF processes, so the same router could 
participate in different domains. I assume your user population 
stretches across at least three time zones, so this sort of redesign 
might localize some core thrashing.

>Peter's summarisation of the problem (pardon the pun) is a very good one -
>and very useful, as I hadn't really considered the broader case of
>overlapping summarisation in general.

I have a question for Peter a little later -- it can be interesting 
to contrast how different routing software deals with an implementer 
choice, and I don't know how JunOS deals with a particular situation.

>The chance of a major redesign
>simply to fix this problem is approximately the same chance as me winning
>an Olympic medal, but we may well be doing a major redesign/readdressing
>"soon" anyway, so that is something I can add to the list of
>considerations - then it may be quite feasible to put all the sites on
>Rtr1 into area 2.1.0.0 and all the sites on Rtr2 into area 2.2.0.0.  Any
>thoughts on where the local ethernets should go if we did that?  I guess
>whatever area they go in would have to be defined on both routers, and
>that might bring up issues of where we summarise again.  Hmm.  I'll have
to think about that.

>
>One thing I'm not clear on, though, is why the problem (reportedly)
>happened before we upgraded to IOS 12.1 - so before a route to null0 was
>used for the summarised networks (we didn't add one manually).  Any ideas?
>  I can understand why it's happening now, so this is more for my curiosity
>and understanding.
>
>Peter, when you say that the solution could involve "less specific
>summaries" - do you really mean more specific summaries?  Summarising less
>drastically (e.g. summarising each site separately) isn't a good solution
>in this particular case because it creates too much load in the core -
>that's how we used to do it but it created other problems.

One of the interesting things about the OSPF specification is that it 
leaves a lot of room to the implementer on handling summarization 
when some of the more-specific routes become unreachable from one ABR 
in a multiple-ABR area.  I know of at least two ways this has been 
implemented, and I wish both were selectable -- there's a place for 
each.

With IOS, if you have two ABRs in the same area, announcing the same 
summary, and the area becomes partitioned, the summaries continue to 
be announced. The rationale here is that the greater stability is 
worth some loss in connectivity. In other words, it's a static 
process of generating the summary.

In Bay RS, in the same situation, you tell the router what 
more-specifics belong to a summary.  If any of them become 
unreachable, the ABR stops announcing the summary and announces the 
remaining more-specifics into the core. The different rationale here 
is that accuracy is more important than increased route thrashing.

As you should be able to see, each of these can be valid assumptions 
depending on your network objectives.  Peter, how does JunOS deal 
with this situation?

What would be really nice is if Cisco extended BGP conditional 
advertisement to IGPs, and introduced a knob to have the default 
behavior overridden by conditional.

>I think in this case I'll be going for the "protect against partitioning"
>solution and bung in another cable.


Sounds good.  A general rule -- always have at least two paths 
between pairs of ABRs in an area.

>
>Thanks for comments - very useful.
>
>JMcL
>
>- Forwarded by Jenny Mcleod/NSO/CSDA on 05/04/2002 08:47 am -
>
>
>"Priscilla Oppenheimer"
>Sent by: [EMAIL PROTECTED]
>05/04/2002 04:39 am
>Please respond to "Priscilla Oppenheimer"
>
>
> To: [EMAIL PR

Re: OSPF design [7:40269]

2002-04-04 Thread [EMAIL PROTECTED]

Peter's summarisation of the problem (pardon the pun) is a very good one - 
and very useful, as I hadn't really considered the broader case of 
overlapping summarisation in general.  The chance of a major redesign 
simply to fix this problem is approximately the same chance as me winning 
an Olympic medal, but we may well be doing a major redesign/readdressing 
"soon" anyway, so that is something I can add to the list of 
considerations - then it may be quite feasible to put all the sites on 
Rtr1 into area 2.1.0.0 and all the sites on Rtr2 into area 2.2.0.0.  Any 
thoughts on where the local ethernets should go if we did that?  I guess 
whatever area they go in would have to be defined on both routers, and 
that might bring up issues of where we summarise again.  Hmm.  I'll have 
to think about that. 

One thing I'm not clear on, though, is why the problem (reportedly) 
happened before we upgraded to IOS 12.1 - so before a route to null0 was 
used for the summarised networks (we didn't add one manually).  Any ideas? 
 I can understand why it's happening now, so this is more for my curiosity 
and understanding.

Peter, when you say that the solution could involve "less specific 
summaries" - do you really mean more specific summaries?  Summarising less 
drastically (e.g. summarising each site separately) isn't a good solution 
in this particular case because it creates too much load in the core - 
that's how we used to do it but it created other problems.
I think in this case I'll be going for the "protect against partitioning" 
solution and bung in another cable. 

Thanks for comments - very useful.

JMcL

- Forwarded by Jenny Mcleod/NSO/CSDA on 05/04/2002 08:47 am -


"Priscilla Oppenheimer" 
Sent by: [EMAIL PROTECTED]
05/04/2002 04:39 am
Please respond to "Priscilla Oppenheimer"

 
To: [EMAIL PROTECTED]
cc: 
Subject:Re: OSPF design [7:40269]


At 11:59 AM 4/4/02, Chuck wrote:
>that was going to be my guess as well. I've done a number of lab 
experiments
>with similar themes, and have in my own mind at least, confirmed what is
>stated in the RFC - that the only serious routing issue with partitioned
>non-backbone areas results from overlapping

She does seem to have overlapping summarization, if that makes sense. She
said:

The area range statements on Rtr2 are...
[various area 0 range statements snipped]
  area 2.1.0.0 range 2.0.0.0 255.128.0.0
  area 2.2.0.0 range 2.128.0.0 255.224.0.0

On Rtr1 the statements are...
[same area 0 range statements snipped]
  area 2.1.0.0 range 2.0.0.0 255.128.0.0

If you look at her ASCII art e-mail, you'll see that the WAN links were 
not 
assigned contiguously unless I'm missing something. Rt1 has 2.101.0.0/16 
and 2.109.0.0/16. Rtr 2 has 2.120.0.0/16, 2.104.0.0/16, and 2.130.0.0/16

It's probably too late now, but perhaps if all the WAN links connected to 
Rtr 1 had been summarizable into a group that was distinct from the WAN 
links connected to Rtr 2, she wouldn't have the problem?? (Of course, she 
has that area 2.2.0.0 to deal with too, but perhaps it could be something 
different entirely)

But I don't think she's looking for a redesign. She's looking for a quick 
fix for now. What did you guys think of the idea of adding another direct 
connection between the two switches and putting it in area 2.1.0.0?

Priscilla


>Chuck
>
>""Peter van Oene""  wrote in message
>[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > HI Jenny,
> >
> > Is it safe to say that your problem is that when your non backbone 
area
> > becomes partitioned, you lose reachability to one side of the
> > partition?  When you use large summarizes to describe entire areas and
>have
> > multiple entry points into those areas themselves, this is a normal
> > occurrence.  If this is the problem, the solution likely involves the 
use
> > of less specific summaries per ABR, and/or greater L2 resiliency to
>protect
> > against partitions.  If that's not the problem, can you indicate where
>I've
> > misread the problem description?
> >
> > Thanks
> >
> > Pete
> >
> >
> >
> > At 09:05 PM 4/2/2002 -0500, [EMAIL PROTECTED] wrote:
> > >Hi all,
> > >
> > >This is actually a real-life scenario, but I think it throws up some
> > >interesting points about OSPF that some people may not have come 
across.
> > >And it has a couple of bits that I don't understand.  Please excuse 
the
> > >verbosity.
> > >
> > >Currently, (part of) this particular network is as described below. 
It
> > >normally works fine, but during certain types of failures, 
connectivity
> > >bre

Re: OSPF design [7:40269]

2002-04-04 Thread Kent Yu

Jenny,
Jenny,

I think you may want to build a tunnel between the area 0 ethernet
interfaces of R1 and R2,  then put the tunnel in to area 2.1.0.0.

This just creates another connection from your 2.1.0.0 of R2 to R1, so that
the 2.1.0.0 subnets from R2 could be installed into R1 routing table in case
the switches fail. Since the ospf cost of the tunnel should be higher, the
traffic should not use the tunnel link under normal status.

HTHs.
Kent Yu

- Original Message -
From: "Priscilla Oppenheimer" 
To: 
Sent: Thursday, April 04, 2002 1:39 PM
Subject: Re: OSPF design [7:40269]


> At 11:59 AM 4/4/02, Chuck wrote:
> >that was going to be my guess as well. I've done a number of lab
experiments
> >with similar themes, and have in my own mind at least, confirmed what is
> >stated in the RFC - that the only serious routing issue with partitioned
> >non-backbone areas results from overlapping
>
> She does seem to have overlapping summarization, if that makes sense. She
> said:
>
> The area range statements on Rtr2 are...
> [various area 0 range statements snipped]
>   area 2.1.0.0 range 2.0.0.0 255.128.0.0
>   area 2.2.0.0 range 2.128.0.0 255.224.0.0
>
> On Rtr1 the statements are...
> [same area 0 range statements snipped]
>   area 2.1.0.0 range 2.0.0.0 255.128.0.0
>
> If you look at her ASCII art e-mail, you'll see that the WAN links were
not
> assigned contiguously unless I'm missing something. Rt1 has 2.101.0.0/16
> and 2.109.0.0/16. Rtr 2 has 2.120.0.0/16, 2.104.0.0/16, and 2.130.0.0/16
>
> It's probably too late now, but perhaps if all the WAN links connected to
> Rtr 1 had been summarizable into a group that was distinct from the WAN
> links connected to Rtr 2, she wouldn't have the problem?? (Of course, she
> has that area 2.2.0.0 to deal with too, but perhaps it could be something
> different entirely)
>
> But I don't think she's looking for a redesign. She's looking for a quick
> fix for now. What did you guys think of the idea of adding another direct
> connection between the two switches and putting it in area 2.1.0.0?
>
> Priscilla
>
>
> >Chuck
> >
> >""Peter van Oene""  wrote in message
> >[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > > HI Jenny,
> > >
> > > Is it safe to say that your problem is that when your non backbone
area
> > > becomes partitioned, you lose reachability to one side of the
> > > partition?  When you use large summarizes to describe entire areas and
> >have
> > > multiple entry points into those areas themselves, this is a normal
> > > occurrence.  If this is the problem, the solution likely involves the
use
> > > of less specific summaries per ABR, and/or greater L2 resiliency to
> >protect
> > > against partitions.  If that's not the problem, can you indicate where
> >I've
> > > misread the problem description?
> > >
> > > Thanks
> > >
> > > Pete
> > >
> > >
> > >
> > > At 09:05 PM 4/2/2002 -0500, [EMAIL PROTECTED] wrote:
> > > >Hi all,
> > > >
> > > >This is actually a real-life scenario, but I think it throws up some
> > > >interesting points about OSPF that some people may not have come
across.
> > > >And it has a couple of bits that I don't understand.  Please excuse
the
> > > >verbosity.
> > > >
> > > >Currently, (part of) this particular network is as described below.
It
> > > >normally works fine, but during certain types of failures,
connectivity
> > > >breaks although there is still a physical path.  I am contemplating
what
> > > >the best way to fix it would be, and would be interested in comments.
> > > >
> > > >Set-up - I don't think my ascii art is up to this but I'll give it a
go
> >if
> > > >the description isn't clear enough:
> > > >
> > > >Two ABRs (Rtr1 and Rtr2), running IOS 12.1, connected to each other
by a
> > > >direct ethernet cable in area 0, and also by several local ethernet
> > > >networks in area 2.1.0.0.  The details of the local ethernets can
> >probably
> > > >remain a fluffy cloud, but note that failure of a single component
can
> > > >potentially cause all area 2.1.0.0 neighbour connectivity between
Rtr1
> >and
> > > >Rtr2 to be lost, although the local ethernets may remain up on one or
> >both
> > > >routers.
> > > >
> > > >Both routers have a connection back to the core of the network (on
Rtr

Re: OSPF design [7:40269]

2002-04-04 Thread Kent Yu

Jenny,



I think you may want to try build a tunnle between the area 0 ethernet
interfaces of R1 and R2,  then put the tunnel in to area 2.1.0.0.

This just creates another connection from your 2.1.0.0 of R2 to R1, so that
the 2.1.0.0 subnets from R2 could be installed into R1 routing table in case
the switches fail. Since the ospf cost of the tunnel should be higher, the
traffic should not use the tunnel link under normal status.



HTHs.

Kent Yu


- Original Message -
From: "Priscilla Oppenheimer" 
To: 
Sent: Thursday, April 04, 2002 1:39 PM
Subject: Re: OSPF design [7:40269]


> At 11:59 AM 4/4/02, Chuck wrote:
> >that was going to be my guess as well. I've done a number of lab
experiments
> >with similar themes, and have in my own mind at least, confirmed what is
> >stated in the RFC - that the only serious routing issue with partitioned
> >non-backbone areas results from overlapping
>
> She does seem to have overlapping summarization, if that makes sense. She
> said:
>
> The area range statements on Rtr2 are...
> [various area 0 range statements snipped]
>   area 2.1.0.0 range 2.0.0.0 255.128.0.0
>   area 2.2.0.0 range 2.128.0.0 255.224.0.0
>
> On Rtr1 the statements are...
> [same area 0 range statements snipped]
>   area 2.1.0.0 range 2.0.0.0 255.128.0.0
>
> If you look at her ASCII art e-mail, you'll see that the WAN links were
not
> assigned contiguously unless I'm missing something. Rt1 has 2.101.0.0/16
> and 2.109.0.0/16. Rtr 2 has 2.120.0.0/16, 2.104.0.0/16, and 2.130.0.0/16
>
> It's probably too late now, but perhaps if all the WAN links connected to
> Rtr 1 had been summarizable into a group that was distinct from the WAN
> links connected to Rtr 2, she wouldn't have the problem?? (Of course, she
> has that area 2.2.0.0 to deal with too, but perhaps it could be something
> different entirely)
>
> But I don't think she's looking for a redesign. She's looking for a quick
> fix for now. What did you guys think of the idea of adding another direct
> connection between the two switches and putting it in area 2.1.0.0?
>
> Priscilla
>
>
> >Chuck
> >
> >""Peter van Oene""  wrote in message
> >[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > > HI Jenny,
> > >
> > > Is it safe to say that your problem is that when your non backbone
area
> > > becomes partitioned, you lose reachability to one side of the
> > > partition?  When you use large summarizes to describe entire areas and
> >have
> > > multiple entry points into those areas themselves, this is a normal
> > > occurrence.  If this is the problem, the solution likely involves the
use
> > > of less specific summaries per ABR, and/or greater L2 resiliency to
> >protect
> > > against partitions.  If that's not the problem, can you indicate where
> >I've
> > > misread the problem description?
> > >
> > > Thanks
> > >
> > > Pete
> > >
> > >
> > >
> > > At 09:05 PM 4/2/2002 -0500, [EMAIL PROTECTED] wrote:
> > > >Hi all,
> > > >
> > > >This is actually a real-life scenario, but I think it throws up some
> > > >interesting points about OSPF that some people may not have come
across.
> > > >And it has a couple of bits that I don't understand.  Please excuse
the
> > > >verbosity.
> > > >
> > > >Currently, (part of) this particular network is as described below.
It
> > > >normally works fine, but during certain types of failures,
connectivity
> > > >breaks although there is still a physical path.  I am contemplating
what
> > > >the best way to fix it would be, and would be interested in comments.
> > > >
> > > >Set-up - I don't think my ascii art is up to this but I'll give it a
go
> >if
> > > >the description isn't clear enough:
> > > >
> > > >Two ABRs (Rtr1 and Rtr2), running IOS 12.1, connected to each other
by a
> > > >direct ethernet cable in area 0, and also by several local ethernet
> > > >networks in area 2.1.0.0.  The details of the local ethernets can
> >probably
> > > >remain a fluffy cloud, but note that failure of a single component
can
> > > >potentially cause all area 2.1.0.0 neighbour connectivity between
Rtr1
> >and
> > > >Rtr2 to be lost, although the local ethernets may remain up on one or
> >both
> > > >routers.
> > > >
> > > >Both routers have a connection back to the core of the network (on
Rtr

Re: OSPF design [7:40269]

2002-04-04 Thread Priscilla Oppenheimer

At 11:59 AM 4/4/02, Chuck wrote:
>that was going to be my guess as well. I've done a number of lab experiments
>with similar themes, and have in my own mind at least, confirmed what is
>stated in the RFC - that the only serious routing issue with partitioned
>non-backbone areas results from overlapping

She does seem to have overlapping summarization, if that makes sense. She
said:

The area range statements on Rtr2 are...
[various area 0 range statements snipped]
  area 2.1.0.0 range 2.0.0.0 255.128.0.0
  area 2.2.0.0 range 2.128.0.0 255.224.0.0

On Rtr1 the statements are...
[same area 0 range statements snipped]
  area 2.1.0.0 range 2.0.0.0 255.128.0.0

If you look at her ASCII art e-mail, you'll see that the WAN links were not 
assigned contiguously unless I'm missing something. Rt1 has 2.101.0.0/16 
and 2.109.0.0/16. Rtr 2 has 2.120.0.0/16, 2.104.0.0/16, and 2.130.0.0/16

It's probably too late now, but perhaps if all the WAN links connected to 
Rtr 1 had been summarizable into a group that was distinct from the WAN 
links connected to Rtr 2, she wouldn't have the problem?? (Of course, she 
has that area 2.2.0.0 to deal with too, but perhaps it could be something 
different entirely)

But I don't think she's looking for a redesign. She's looking for a quick 
fix for now. What did you guys think of the idea of adding another direct 
connection between the two switches and putting it in area 2.1.0.0?

Priscilla


>Chuck
>
>""Peter van Oene""  wrote in message
>[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > HI Jenny,
> >
> > Is it safe to say that your problem is that when your non backbone area
> > becomes partitioned, you lose reachability to one side of the
> > partition?  When you use large summarizes to describe entire areas and
>have
> > multiple entry points into those areas themselves, this is a normal
> > occurrence.  If this is the problem, the solution likely involves the use
> > of less specific summaries per ABR, and/or greater L2 resiliency to
>protect
> > against partitions.  If that's not the problem, can you indicate where
>I've
> > misread the problem description?
> >
> > Thanks
> >
> > Pete
> >
> >
> >
> > At 09:05 PM 4/2/2002 -0500, [EMAIL PROTECTED] wrote:
> > >Hi all,
> > >
> > >This is actually a real-life scenario, but I think it throws up some
> > >interesting points about OSPF that some people may not have come across.
> > >And it has a couple of bits that I don't understand.  Please excuse the
> > >verbosity.
> > >
> > >Currently, (part of) this particular network is as described below.  It
> > >normally works fine, but during certain types of failures, connectivity
> > >breaks although there is still a physical path.  I am contemplating what
> > >the best way to fix it would be, and would be interested in comments.
> > >
> > >Set-up - I don't think my ascii art is up to this but I'll give it a go
>if
> > >the description isn't clear enough:
> > >
> > >Two ABRs (Rtr1 and Rtr2), running IOS 12.1, connected to each other by a
> > >direct ethernet cable in area 0, and also by several local ethernet
> > >networks in area 2.1.0.0.  The details of the local ethernets can
>probably
> > >remain a fluffy cloud, but note that failure of a single component can
> > >potentially cause all area 2.1.0.0 neighbour connectivity between Rtr1
>and
> > >Rtr2 to be lost, although the local ethernets may remain up on one or
>both
> > >routers.
> > >
> > >Both routers have a connection back to the core of the network (on Rtr2
>it
> > >is dialup, so not usually active), which is in area 0.  Both routers
have
> > >WAN links to several sites (not dual-homed - each site has a link to
only
> > >one ABR), in area 2.1.0.0.  Rtr2 may also have WAN links to several
sites
> > >in area 2.2.0.0, but that's probably not too relevant.
> > >
> > >Both ABRs summarise the networks in area 2.1.0.0 to a single summary
> > >network (Rtr2 summarises the networks in 2.2.0.0, if any, to another
> > >summary network).
> > >
> > >This usually works fine - traffic from the core to sites connected to
>Rtr2
> > >(in area 2.1.0.0) travels from Rtr1 to Rtr2 across the local ethernets
> > >(area 2.1.0.0), and in reverse from Rtr2 to Rtr1 across the Area 0
> > >ethernet.  This, while perhaps not ideal, is as expected, and works well
> > >under normal circumstances.  (If you're not sure why this is expected,
> > >read up on hot potato routing policy - Howard gave a good description in
> > >the context of stub areas in
> > >http://www.groupstudy.com/archives/cisco/21/msg01579.html)
> > >
> > >The problem happens if the area 2.1.0.0 neighbour connections between
>Rtr1
> > >and Rtr2 are lost.  Even though there is still an area 0 link between
> > >them, area 2.1.0.0 sites connected to rtr2 lose connectivity to the
core.
> > >Area 2.2.0.0 sites are OK (this is good - I'd be really confused if they
> > >lost it too).
> > >Despite Doyle claiming that partitioned non-backbone areas are not a
> > >problem (he does, on page 462 of

Re: OSPF design [7:40269]

2002-04-04 Thread Chuck

that was going to be my guess as well. I've done a number of lab experiments
with similar themes, and have in my own mind at least, confirmed what is
stated in the RFC - that the only serious routing issue with partitioned
non-backbone areas results from overlapping subnets.

Chuck

""Peter van Oene""  wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> HI Jenny,
>
> Is it safe to say that your problem is that when your non backbone area
> becomes partitioned, you lose reachability to one side of the
> partition?  When you use large summarizes to describe entire areas and
have
> multiple entry points into those areas themselves, this is a normal
> occurrence.  If this is the problem, the solution likely involves the use
> of less specific summaries per ABR, and/or greater L2 resiliency to
protect
> against partitions.  If that's not the problem, can you indicate where
I've
> misread the problem description?
>
> Thanks
>
> Pete
>
>
>
> At 09:05 PM 4/2/2002 -0500, [EMAIL PROTECTED] wrote:
> >Hi all,
> >
> >This is actually a real-life scenario, but I think it throws up some
> >interesting points about OSPF that some people may not have come across.
> >And it has a couple of bits that I don't understand.  Please excuse the
> >verbosity.
> >
> >Currently, (part of) this particular network is as described below.  It
> >normally works fine, but during certain types of failures, connectivity
> >breaks although there is still a physical path.  I am contemplating what
> >the best way to fix it would be, and would be interested in comments.
> >
> >Set-up - I don't think my ascii art is up to this but I'll give it a go
if
> >the description isn't clear enough:
> >
> >Two ABRs (Rtr1 and Rtr2), running IOS 12.1, connected to each other by a
> >direct ethernet cable in area 0, and also by several local ethernet
> >networks in area 2.1.0.0.  The details of the local ethernets can
probably
> >remain a fluffy cloud, but note that failure of a single component can
> >potentially cause all area 2.1.0.0 neighbour connectivity between Rtr1
and
> >Rtr2 to be lost, although the local ethernets may remain up on one or
both
> >routers.
> >
> >Both routers have a connection back to the core of the network (on Rtr2
it
> >is dialup, so not usually active), which is in area 0.  Both routers have
> >WAN links to several sites (not dual-homed - each site has a link to only
> >one ABR), in area 2.1.0.0.  Rtr2 may also have WAN links to several sites
> >in area 2.2.0.0, but that's probably not too relevant.
> >
> >Both ABRs summarise the networks in area 2.1.0.0 to a single summary
> >network (Rtr2 summarises the networks in 2.2.0.0, if any, to another
> >summary network).
> >
> >This usually works fine - traffic from the core to sites connected to
Rtr2
> >(in area 2.1.0.0) travels from Rtr1 to Rtr2 across the local ethernets
> >(area 2.1.0.0), and in reverse from Rtr2 to Rtr1 across the Area 0
> >ethernet.  This, while perhaps not ideal, is as expected, and works well
> >under normal circumstances.  (If you're not sure why this is expected,
> >read up on hot potato routing policy - Howard gave a good description in
> >the context of stub areas in
> >http://www.groupstudy.com/archives/cisco/21/msg01579.html)
> >
> >The problem happens if the area 2.1.0.0 neighbour connections between
Rtr1
> >and Rtr2 are lost.  Even though there is still an area 0 link between
> >them, area 2.1.0.0 sites connected to rtr2 lose connectivity to the core.
> >Area 2.2.0.0 sites are OK (this is good - I'd be really confused if they
> >lost it too).
> >Despite Doyle claiming that partitioned non-backbone areas are not a
> >problem (he does, on page 462 of Routing TCP/IP Vol 1), it seems they can
> >be.  As far as I can see, it's because when summarising the 2.1.0.0
> >networks, Rtr1 also installs a route to null0 for the summary route -
> >which overrides the summary route that Rtr2 generates (and which would
> >otherwise cover the 'lost' sites).
> >
> >I can see a couple of possibilities for fixing this...
> >1) Install a second direct ethernet cable between Rtr1 and Rtr2, in area
> >2.1.0.0.  This may not be particularly elegant, but it should be
> >comparatively easy to do and effective (there are plenty of spare
ethernet
> >ports).  It also has the useful side-effect of getting the through
traffic
> >off the local ethernets.
> >
> >2) Use the "no discard-route internal" command - this doesn't appear to
be
> >documented but is mentioned at
> >http://www.cisco.com/warp/public/104/3.html#12.0
> >I haven't tested it, but I think it should prevent the null0 route from
> >being installed by Rtr1, so my theory is that then the summary generated
> >by Rtr2 should come into play.  This, of course, goes against all Cisco
> >recommendations, which say that having the null0 route is A Good Thing to
> >prevent routing loops.
> >
> >3) Muck about with the arrangement of switches within the internal
> >networks.  I think this will cause more trouble than 

Re: OSPF design [7:40269]

2002-04-04 Thread Peter van Oene

HI Jenny,

Is it safe to say that your problem is that when your non backbone area 
becomes partitioned, you lose reachability to one side of the 
partition?  When you use large summarizes to describe entire areas and have 
multiple entry points into those areas themselves, this is a normal 
occurrence.  If this is the problem, the solution likely involves the use 
of less specific summaries per ABR, and/or greater L2 resiliency to protect 
against partitions.  If that's not the problem, can you indicate where I've 
misread the problem description?

Thanks

Pete



At 09:05 PM 4/2/2002 -0500, [EMAIL PROTECTED] wrote:
>Hi all,
>
>This is actually a real-life scenario, but I think it throws up some
>interesting points about OSPF that some people may not have come across.
>And it has a couple of bits that I don't understand.  Please excuse the
>verbosity.
>
>Currently, (part of) this particular network is as described below.  It
>normally works fine, but during certain types of failures, connectivity
>breaks although there is still a physical path.  I am contemplating what
>the best way to fix it would be, and would be interested in comments.
>
>Set-up - I don't think my ascii art is up to this but I'll give it a go if
>the description isn't clear enough:
>
>Two ABRs (Rtr1 and Rtr2), running IOS 12.1, connected to each other by a
>direct ethernet cable in area 0, and also by several local ethernet
>networks in area 2.1.0.0.  The details of the local ethernets can probably
>remain a fluffy cloud, but note that failure of a single component can
>potentially cause all area 2.1.0.0 neighbour connectivity between Rtr1 and
>Rtr2 to be lost, although the local ethernets may remain up on one or both
>routers.
>
>Both routers have a connection back to the core of the network (on Rtr2 it
>is dialup, so not usually active), which is in area 0.  Both routers have
>WAN links to several sites (not dual-homed - each site has a link to only
>one ABR), in area 2.1.0.0.  Rtr2 may also have WAN links to several sites
>in area 2.2.0.0, but that's probably not too relevant.
>
>Both ABRs summarise the networks in area 2.1.0.0 to a single summary
>network (Rtr2 summarises the networks in 2.2.0.0, if any, to another
>summary network).
>
>This usually works fine - traffic from the core to sites connected to Rtr2
>(in area 2.1.0.0) travels from Rtr1 to Rtr2 across the local ethernets
>(area 2.1.0.0), and in reverse from Rtr2 to Rtr1 across the Area 0
>ethernet.  This, while perhaps not ideal, is as expected, and works well
>under normal circumstances.  (If you're not sure why this is expected,
>read up on hot potato routing policy - Howard gave a good description in
>the context of stub areas in
>http://www.groupstudy.com/archives/cisco/21/msg01579.html)
>
>The problem happens if the area 2.1.0.0 neighbour connections between Rtr1
>and Rtr2 are lost.  Even though there is still an area 0 link between
>them, area 2.1.0.0 sites connected to rtr2 lose connectivity to the core.
>Area 2.2.0.0 sites are OK (this is good - I'd be really confused if they
>lost it too).
>Despite Doyle claiming that partitioned non-backbone areas are not a
>problem (he does, on page 462 of Routing TCP/IP Vol 1), it seems they can
>be.  As far as I can see, it's because when summarising the 2.1.0.0
>networks, Rtr1 also installs a route to null0 for the summary route -
>which overrides the summary route that Rtr2 generates (and which would
>otherwise cover the 'lost' sites).
>
>I can see a couple of possibilities for fixing this...
>1) Install a second direct ethernet cable between Rtr1 and Rtr2, in area
>2.1.0.0.  This may not be particularly elegant, but it should be
>comparatively easy to do and effective (there are plenty of spare ethernet
>ports).  It also has the useful side-effect of getting the through traffic
>off the local ethernets.
>
>2) Use the "no discard-route internal" command - this doesn't appear to be
>documented but is mentioned at
>http://www.cisco.com/warp/public/104/3.html#12.0
>I haven't tested it, but I think it should prevent the null0 route from
>being installed by Rtr1, so my theory is that then the summary generated
>by Rtr2 should come into play.  This, of course, goes against all Cisco
>recommendations, which say that having the null0 route is A Good Thing to
>prevent routing loops.
>
>3) Muck about with the arrangement of switches within the internal
>networks.  I think this will cause more trouble than it's worth, since any
>rearrangement has to be duplicated at twenty sites.  In theory at least,
>the whole network may be redesigned from scratch over the next year or so,
>so a quick and dirty fix isn't necessarily a problem.
>
>BUT... I am also not positive that my understanding of what is happening
>and why is correct, because the support guys have told me that this
>problem has been around since we were running IOS 11.2 on the ABRs (not
>that long ago, believe it or not), and I'm pretty sure that no route to
>null0 was bein