Re: [Nagios-users] Ring topology parent/child relation Nagios

2008-05-12 Thread Hugo van der Kooij
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mihai Tanasescu wrote:
|> This problem should not exist.
| Nagios --> Router A  --> Router B uplink1+2 ring (and Router B is in a
| ring topology which closes in it)
|
| http://tinypic.com/view.php?pic=11uhx7a&s=3 (this is the logical layout)
|
| Yes. But if you cut the 2 uplinks from Router B, then the Nagios machine
| will see Router B as up but will not be able to reach any other router
| from the ring and will thus alert that all other routers are down (which
| is not true).
| I mean having split the ring into the 2 halves you suggested that:
| C has parent B, D has parent C, E has parent D
| G has parent B, F has parent G
| => B up but B uplinks to C and G down -> alerts that C and G are down
| although they aren't
|
| Can this be eliminated ? (I'm sure the solution should be simple and
| obvious but I'm not being as careful as I should to see it)

A ring config is a nightmare from the perspective of Nagios. The maths
simply do not work. The whole parent concept does not work for a ring.
The best you can do is some half way concept that will never show the
proper state in all cases.

Building a config to keep the amount of down reports to a minimum is not
a simple thing. The key is to cut thing in half and make sure you get
the timing right. Each node further away must wait longer to go from
soft fail to hard fail state. The manual handdles that subject and it is
mandatory to read it before you even try to use the parent feature.

So either spend many hours in perfecting a model to get a half way there
solution or accept the extra down reports and learn to interprete them
as an exact way of telling where you ring did break up.

There is no simple solution.

Hugo.

- --
[EMAIL PROTECTED]   http://hugo.vanderkooij.org/
PGP/GPG? Use: http://hugo.vanderkooij.org/0x58F19981.asc

A: Yes.
>Q: Are you sure?
>>A: Because it reverses the logical flow of conversation.
>>>Q: Why is top posting frowned upon?

Bored? Click on http://spamornot.org/ and rate those images.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFIKLfgBvzDRVjxmYERAsphAJ0R79rfSgtvCTNXwT0Iaxolv+2S3gCeO4fv
Ut/6lXS+4+udsR2pUbMGY/o=
=1Qrk
-END PGP SIGNATURE-

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Ring topology parent/child relation Nagios

2008-05-12 Thread Mihai Tanasescu

>
> This problem should not exist.
Nagios --> Router A  --> Router B uplink1+2 ring (and Router B is in a
ring topology which closes in it)

http://tinypic.com/view.php?pic=11uhx7a&s=3 (this is the logical layout)

Yes. But if you cut the 2 uplinks from Router B, then the Nagios machine
will see Router B as up but will not be able to reach any other router
from the ring and will thus alert that all other routers are down (which
is not true).
I mean having split the ring into the 2 halves you suggested that:
C has parent B, D has parent C, E has parent D
G has parent B, F has parent G
=> B up but B uplinks to C and G down -> alerts that C and G are down
although they aren't

Can this be eliminated ? (I'm sure the solution should be simple and
obvious but I'm not being as careful as I should to see it)


Am I right ?


P.S. Currently I am monitoring each link state (up/down) by using SNMP
interface queries (on Cisco routers) and the hosts themselves with
ping/icmp on loopback interfaces that are propagated throughout the
network for reachability(OSPF).


>
> Because if you cut the ring in 1 place all nodes can still be reached.
> So no router will go down. If you cut it in 2 places you loose part of
> the ring and only get alerts for the nodes directly on the other side of
> the cuts from your perspective.
>
> If you alert on unreachable as well then you get all the alerts you
> tried to get rid of by introducing the parent relation in the first
> place. So don't use them.
>
> You need an additional means of detecting your first cut in the ring as
> all routers can still be reached at that time and you will never know
> you had a problem unless you alert on the actual link conditions.
>


-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Ring topology parent/child relation Nagios

2008-05-12 Thread Hugo van der Kooij
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mihai Tanasescu wrote:
| Mihai Tanasescu wrote:
|> | I have some problems defining the parent/child relationships to reflect
|> | changes and monitoring on the map.
|> |
|> | My topology is something like this:
|> |
|> | Nagios machine --- Router A  Router B
|> |
|> | Router B --- Router C --- Router D --- Router E ---Router F ---
Router B
|> | (ring closing itself)
|> |
|> | but on the Router B ring I can't define parent relationships in a
|> | circular way because nagios refuses to start when it detects this.
|>
|> The whole concept of a ring setup is that a single disaster can not
|> cause a network failure. For this setup I would only follow the ring
|> halfway.
|>
|> So you get 2 chains:
|>
|> Nagios --> A --> B --> C --> D
|> Nagios --> A --> B --> F --> E
|>
|> Make sure you monitor each neighbor on each ring router to make sure the
|> ring is working as expected.
|>
|> If you use dynamic routing you might want to monitor route changes
|> relevant for the proper operation of your ring setup.

| Thanks for the tip but I have one more question which refers to my
| current problem in fact. (I configured sms sending for down events).
|
| In case for example router B loses both its links to C and F (2
| fibercuts on the network), then I will be getting SMSes stating that
| C,D,F,E are down.
| B in fact will not be down as a system but will be unable to reach the
| others.
|
| How could I solve this and avoid sending misleading sms messages
| regarding down events?

This problem should not exist.

Because if you cut the ring in 1 place all nodes can still be reached.
So no router will go down. If you cut it in 2 places you loose part of
the ring and only get alerts for the nodes directly on the other side of
the cuts from your perspective.

If you alert on unreachable as well then you get all the alerts you
tried to get rid of by introducing the parent relation in the first
place. So don't use them.

You need an additional means of detecting your first cut in the ring as
all routers can still be reached at that time and you will never know
you had a problem unless you alert on the actual link conditions.

Now getting the link condition to Nagios is something you need to work
out. Due to the lack of details it will be hard to help you there at the
moment. But considere the links to be the vital services for the host.

Hugo.

- --
[EMAIL PROTECTED]   http://hugo.vanderkooij.org/
PGP/GPG? Use: http://hugo.vanderkooij.org/0x58F19981.asc

A: Yes.
>Q: Are you sure?
>>A: Because it reverses the logical flow of conversation.
>>>Q: Why is top posting frowned upon?

Bored? Click on http://spamornot.org/ and rate those images.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFIKKn4BvzDRVjxmYERAqs5AKCQVpx9YEJtti6ghzB6f70MKRsMWwCgmJk5
MYJnCshGVZeHPXVYT2w3JrU=
=Y46N
-END PGP SIGNATURE-

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Ring topology parent/child relation Nagios

2008-05-12 Thread Mihai Tanasescu
Mihai Tanasescu wrote:
>
> | I have some problems defining the parent/child relationships to reflect
> | changes and monitoring on the map.
> |
> | My topology is something like this:
> |
> | Nagios machine --- Router A  Router B
> |
> | Router B --- Router C --- Router D --- Router E ---Router F --- Router B
> | (ring closing itself)
> |
> | but on the Router B ring I can't define parent relationships in a
> | circular way because nagios refuses to start when it detects this.
>
> The whole concept of a ring setup is that a single disaster can not
> cause a network failure. For this setup I would only follow the ring
> halfway.
>
> So you get 2 chains:
>
> Nagios --> A --> B --> C --> D
> Nagios --> A --> B --> F --> E
>
> Make sure you monitor each neighbor on each ring router to make sure the
> ring is working as expected.
>
> If you use dynamic routing you might want to monitor route changes
> relevant for the proper operation of your ring setup.
>
> Hugo.
>
Hello Hugo,


Thanks for the tip but I have one more question which refers to my
current problem in fact. (I configured sms sending for down events).

In case for example router B loses both its links to C and F (2
fibercuts on the network), then I will be getting SMSes stating that
C,D,F,E are down.
B in fact will not be down as a system but will be unable to reach the
others.


How could I solve this and avoid sending misleading sms messages
regarding down events?


Thanks,
Mihai

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Ring topology parent/child relation Nagios

2008-05-12 Thread Hugo van der Kooij
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mihai Tanasescu wrote:

| I have some problems defining the parent/child relationships to reflect
| changes and monitoring on the map.
|
| My topology is something like this:
|
| Nagios machine --- Router A  Router B
|
| Router B --- Router C --- Router D --- Router E ---Router F --- Router B
| (ring closing itself)
|
| but on the Router B ring I can't define parent relationships in a
| circular way because nagios refuses to start when it detects this.

The whole concept of a ring setup is that a single disaster can not
cause a network failure. For this setup I would only follow the ring
halfway.

So you get 2 chains:

Nagios --> A --> B --> C --> D
Nagios --> A --> B --> F --> E

Make sure you monitor each neighbor on each ring router to make sure the
ring is working as expected.

If you use dynamic routing you might want to monitor route changes
relevant for the proper operation of your ring setup.

Hugo.

- --
[EMAIL PROTECTED]   http://hugo.vanderkooij.org/
PGP/GPG? Use: http://hugo.vanderkooij.org/0x58F19981.asc

A: Yes.
>Q: Are you sure?
>>A: Because it reverses the logical flow of conversation.
>>>Q: Why is top posting frowned upon?

Bored? Click on http://spamornot.org/ and rate those images.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFIKDbGBvzDRVjxmYERAt2PAJ986/BS0M0kgZhAgQfROUgG9ct7rwCfUu2u
7lPWytTiYn0B7T7QAYwpj9c=
=BM9j
-END PGP SIGNATURE-

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null