On Apr 4, 2012, at 11:03 AM, Makewise - Vitor Rui Mendonça wrote:
> I think there's a problem regarding the time between PSE generation and
> poller activation: if OpenNMS shutdown in that time-frame (for example), than
> the outage is never generated/resolved.
Core problem is OpenNMS shutdown. Sure, you can engineer yet another
workaround in the code to this HA issue, but, what is the cost? This isn't the
only place where OpenNMS, or any NMS, is vulnerable to single instance
shutdown/failure. So, where are you going to put your effort, everywhere in
the code? This is going to be a huge effort and probably slow things down in
places we can't even imagine, yet. In this example, what if shutdown happens
before the Trap is received? That is just as critical as after the Trap is
PSE'd but before the Poll… but at least you have the trap and the PSE event.
The time you are referring, between PSE and "poller activation" is maximum 1
sec put, statistically, 1/2 second (or so). But, that is beside the point. 1)
There should be an HA solution for OpenNMS and 2) OpenNMS should not ever have
to be shutdown unless you are doing an upgrade/repair to OpenNMS, the JVM, the
system.
> At least in my case, I have the need to restart OpenNMS sometimes, in order
> to "commit" some changed in configuration. At the time being, our biggest
> OpenNMS installation has almost 2000 nodes. I think this problem isn't
> ignorable within this kind of installation.
Depending on your configuration change, you may not have to do that. Let's
focus on not having to restart for any configuration change. It may be that
your change doesn't require restart.
> Instead of check Outages, what about checking the last PSE
> (PassiveStatusEvent) for the status parameter? Yes, Events table tends to be
> a big one but this operation is only done on OpenNMS startup:
That certainly is an option. I think a combination… first check current
outages, then check for any resolving PSE to those outages > the the outage
time, then check for any PSE "Downs" with no matching "Ups" since the last PSE
outage was recorded.
I'd write up an enhancement issue on this… but I still think the core problem
is HA and restart.
David Hustace
The OpenNMS Group, Inc.
+1 919 533 0160 x7734
david_opennms (skype to my direct line)
'\ . . |>
\ . ' . |
O>> . 'o |
\ . |
/\ . |
/ / .' |
^^^^^^^`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ
opennms-devel mailing list
To *unsubscribe* or change your subscription options, see the bottom of this
page:
https://lists.sourceforge.net/lists/listinfo/opennms-devel