Re: OSPF design [7:40269]

[EMAIL PROTECTED] Thu, 04 Apr 2002 14:57:55 -0800

Peter's summarisation of the problem (pardon the pun) is a very good one - 
and very useful, as I hadn't really considered the broader case of 
overlapping summarisation in general.  The chance of a major redesign 
simply to fix this problem is approximately the same chance as me winning 
an Olympic medal, but we may well be doing a major redesign/readdressing 
"soon" anyway, so that is something I can add to the list of 
considerations - then it may be quite feasible to put all the sites on 
Rtr1 into area 2.1.0.0 and all the sites on Rtr2 into area 2.2.0.0.  Any 
thoughts on where the local ethernets should go if we did that?  I guess 
whatever area they go in would have to be defined on both routers, and 
that might bring up issues of where we summarise again.  Hmm.  I'll have 
to think about that.


One thing I'm not clear on, though, is why the problem (reportedly) 
happened before we upgraded to IOS 12.1 - so before a route to null0 was 
used for the summarised networks (we didn't add one manually).  Any ideas? 
 I can understand why it's happening now, so this is more for my curiosity 
and understanding.

Peter, when you say that the solution could involve "less specific 
summaries" - do you really mean more specific summaries?  Summarising less 
drastically (e.g. summarising each site separately) isn't a good solution 
in this particular case because it creates too much load in the core - 
that's how we used to do it but it created other problems.
I think in this case I'll be going for the "protect against partitioning" 
solution and bung in another cable. 

Thanks for comments - very useful.

JMcL

----- Forwarded by Jenny Mcleod/NSO/CSDA on 05/04/2002 08:47 am -----


"Priscilla Oppenheimer" 
Sent by: [EMAIL PROTECTED]
05/04/2002 04:39 am
Please respond to "Priscilla Oppenheimer"

 
        To:     [EMAIL PROTECTED]
        cc: 
        Subject:        Re: OSPF design [7:40269]


At 11:59 AM 4/4/02, Chuck wrote:
>that was going to be my guess as well. I've done a number of lab 
experiments
>with similar themes, and have in my own mind at least, confirmed what is
>stated in the RFC - that the only serious routing issue with partitioned
>non-backbone areas results from overlapping

She does seem to have overlapping summarization, if that makes sense. She
said:

The area range statements on Rtr2 are...
[various area 0 range statements snipped]
  area 2.1.0.0 range 2.0.0.0 255.128.0.0
  area 2.2.0.0 range 2.128.0.0 255.224.0.0

On Rtr1 the statements are...
[same area 0 range statements snipped]
  area 2.1.0.0 range 2.0.0.0 255.128.0.0

If you look at her ASCII art e-mail, you'll see that the WAN links were 
not 
assigned contiguously unless I'm missing something. Rt1 has 2.101.0.0/16 
and 2.109.0.0/16. Rtr 2 has 2.120.0.0/16, 2.104.0.0/16, and 2.130.0.0/16

It's probably too late now, but perhaps if all the WAN links connected to 
Rtr 1 had been summarizable into a group that was distinct from the WAN 
links connected to Rtr 2, she wouldn't have the problem?? (Of course, she 
has that area 2.2.0.0 to deal with too, but perhaps it could be something 
different entirely....)

But I don't think she's looking for a redesign. She's looking for a quick 
fix for now. What did you guys think of the idea of adding another direct 
connection between the two switches and putting it in area 2.1.0.0?

Priscilla


>Chuck
>
>""Peter van Oene""  wrote in message
>[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > HI Jenny,
> >
> > Is it safe to say that your problem is that when your non backbone 
area
> > becomes partitioned, you lose reachability to one side of the
> > partition?  When you use large summarizes to describe entire areas and
>have
> > multiple entry points into those areas themselves, this is a normal
> > occurrence.  If this is the problem, the solution likely involves the 
use
> > of less specific summaries per ABR, and/or greater L2 resiliency to
>protect
> > against partitions.  If that's not the problem, can you indicate where
>I've
> > misread the problem description?
> >
> > Thanks
> >
> > Pete
> >
> >
> >
> > At 09:05 PM 4/2/2002 -0500, [EMAIL PROTECTED] wrote:
> > >Hi all,
> > >
> > >This is actually a real-life scenario, but I think it throws up some
> > >interesting points about OSPF that some people may not have come 
across.
> > >And it has a couple of bits that I don't understand.  Please excuse 
the
> > >verbosity.
> > >
> > >Currently, (part of) this particular network is as described below. 
It
> > >normally works fine, but during certain types of failures, 
connectivity
> > >breaks although there is still a physical path.  I am contemplating 
what
> > >the best way to fix it would be, and would be interested in comments.
> > >
> > >Set-up - I don't think my ascii art is up to this but I'll give it a 
go
>if
> > >the description isn't clear enough:
> > >
> > >Two ABRs (Rtr1 and Rtr2), running IOS 12.1, connected to each other 
by a
> > >direct ethernet cable in area 0, and also by several local ethernet
> > >networks in area 2.1.0.0.  The details of the local ethernets can
>probably
> > >remain a fluffy cloud, but note that failure of a single component 
can
> > >potentially cause all area 2.1.0.0 neighbour connectivity between 
Rtr1
>and
> > >Rtr2 to be lost, although the local ethernets may remain up on one or
>both
> > >routers.
> > >
> > >Both routers have a connection back to the core of the network (on 
Rtr2
>it
> > >is dialup, so not usually active), which is in area 0.  Both routers
have
> > >WAN links to several sites (not dual-homed - each site has a link to
only
> > >one ABR), in area 2.1.0.0.  Rtr2 may also have WAN links to several
sites
> > >in area 2.2.0.0, but that's probably not too relevant.
> > >
> > >Both ABRs summarise the networks in area 2.1.0.0 to a single summary
> > >network (Rtr2 summarises the networks in 2.2.0.0, if any, to another
> > >summary network).
> > >
> > >This usually works fine - traffic from the core to sites connected to
>Rtr2
> > >(in area 2.1.0.0) travels from Rtr1 to Rtr2 across the local 
ethernets
> > >(area 2.1.0.0), and in reverse from Rtr2 to Rtr1 across the Area 0
> > >ethernet.  This, while perhaps not ideal, is as expected, and works 
well
> > >under normal circumstances.  (If you're not sure why this is 
expected,
> > >read up on hot potato routing policy - Howard gave a good description 
in
> > >the context of stub areas in
> > >http://www.groupstudy.com/archives/cisco/200001/msg01579.html)
> > >
> > >The problem happens if the area 2.1.0.0 neighbour connections between
>Rtr1
> > >and Rtr2 are lost.  Even though there is still an area 0 link between
> > >them, area 2.1.0.0 sites connected to rtr2 lose connectivity to the
core.
> > >Area 2.2.0.0 sites are OK (this is good - I'd be really confused if 
they
> > >lost it too).
> > >Despite Doyle claiming that partitioned non-backbone areas are not a
> > >problem (he does, on page 462 of Routing TCP/IP Vol 1), it seems they
can
> > >be.  As far as I can see, it's because when summarising the 2.1.0.0
> > >networks, Rtr1 also installs a route to null0 for the summary route -
> > >which overrides the summary route that Rtr2 generates (and which 
would
> > >otherwise cover the 'lost' sites).
> > >
> > >I can see a couple of possibilities for fixing this...
> > >1) Install a second direct ethernet cable between Rtr1 and Rtr2, in 
area
> > >2.1.0.0.  This may not be particularly elegant, but it should be
> > >comparatively easy to do and effective (there are plenty of spare
>ethernet
> > >ports).  It also has the useful side-effect of getting the through
>traffic
> > >off the local ethernets.
> > >
> > >2) Use the "no discard-route internal" command - this doesn't appear 
to
>be
> > >documented but is mentioned at
> > >http://www.cisco.com/warp/public/104/3.html#12.0
> > >I haven't tested it, but I think it should prevent the null0 route 
from
> > >being installed by Rtr1, so my theory is that then the summary 
generated
> > >by Rtr2 should come into play.  This, of course, goes against all 
Cisco
> > >recommendations, which say that having the null0 route is A Good 
Thing
to
> > >prevent routing loops.
> > >
> > >3) Muck about with the arrangement of switches within the internal
> > >networks.  I think this will cause more trouble than it's worth, 
since
>any
> > >rearrangement has to be duplicated at twenty sites.  In theory at 
least,
> > >the whole network may be redesigned from scratch over the next year 
or
>so,
> > >so a quick and dirty fix isn't necessarily a problem.
> > >
> > >BUT... I am also not positive that my understanding of what is 
happening
> > >and why is correct, because the support guys have told me that this
> > >problem has been around since we were running IOS 11.2 on the ABRs 
(not
> > >that long ago, believe it or not), and I'm pretty sure that no route 
to
> > >null0 was being generated then (summarisation was the same).
> > >So can anyone explain to me why connectivity would fail even if no 
null0
> > >route was being generated?  What am I missing?
> > >And does anyone feel like commenting on the options for fixing it?
> > >
> > >JMcL
________________________

Priscilla Oppenheimer
http://www.priscilla.com




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=40541&t=40269
--------------------------------------------------
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]

Re: OSPF design [7:40269]

Reply via email to