Just wanted to post a brief message that I have been able to confirm the
'Ops Slave RTunnel' script I created to detect down reverse tunnels does
appear to work.

Normally the output looks something like this:
        RTunnel Okay : prot: tcp recieved: 0 local: 127.0.0.1:25806
remote: 0.0.0.0:* state: LISTEN

But I queried the runtime table for non-okay states for the service and
saw one event 2 days after my initial deployment 'RTunnel NOT Okay,
initiating restart'.

It was driving me nuts that the problem I was setting out to address
seemed to go away once I had a solution in place.
As it turns out the problem did occur but was resolved to quickly to be
detected as a hard state event.

I provided the script to one other Opsview list member so I'm interested
to see if it works for him as well.

James Whittington
VC3, Inc.

-----Original Message-----
From: James Whittington 
Sent: Tuesday, May 19, 2009 10:29 PM
To: 'Opsview Users'
Subject: RE: [opsview-users] periodic autossh issues with
reversetunneldropping

I hereby retract my previous comment about the author of autossh saying
the monitoring function of autossh was buggy.
I was up late the previous night looking to see if there were known
issues with autossh not seeing the reverse tunnel drop and I was sure
someone termed the monitoring port as buggy but I can't even find that
reference now.

The readme suggests using ServerAliveInterval and ServerAliveCountMax
SSH config options are preferred over using the monitoring port but
that's quite a stretch from being buggy :<)...
"In many ways this may be a better solution than the monitoring port" 

For the most part autossh is doing it's job, but I now have a
check_opsview_slave_rtunnel script that is running on all the slave
servers and will restart the slave service if the reverse tunnel is
down.  So far I have not detected a event where a restart was required
so I am not entirely sure it really works yet.

It didn't even occur to me how close I was to writing a simple perl
autossh version while trying to solve the issues I was having :<)..

Someone asked me recently what happened if the connection between master
and slave was severed for a period of time.
I know monitoring continues to occur on the slave, in my case all pages
occur from the master so the slave down notification would be the first
issue, then after 30 minutes hosts go into an unknown state, I wasn't
sure what happened with the service check data from the slave.

Do the slave servers just try to send results regardless of the
connectivity between the master and slave?
Since you reference buffering the slave results I'm guessing that's
exactly how it works currently.    


The buffering of the slave results sounds like a good feature and way to
have complete trending data. I am wondering if the master server would
get confused with a batch of old data flowing in however, and if the
host went from unknown to ok would the downtime seen be the timestamp of
the of the first "ok" state data or the timestamp that the first "ok"
data was received by the master.

Sounds like a fun problem.

James Whittington
VC3, Inc.  
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Ton Voon
Sent: Tuesday, May 19, 2009 10:34 AM
To: Opsview Users
Subject: Re: [opsview-users] periodic autossh issues with
reversetunneldropping


On 16 May 2009, at 06:40, James Whittington wrote:

> I actually think for the reverse ssh setup it would be a good fallback
> check when things don't go as planned with the autossh.
> The alternative to the script was to turn on autossh monitoring which
> the author of the tool says is buggy.

Have you got a reference to this statement?

I'd prefer to use a pure perl method of setting up the reverse ssh  
tunnels, but autossh appeared to solve that problem.

Looking wider, we have a feature we want to add which was to buffer  
the slave results that go up to the master. One piece would be to  
daemonise the "send results to master" process. This could add in the  
"restart opsview-slave" if there was a problem sending results up.

Ton

_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/listinfo/opsview-users
_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/listinfo/opsview-users

Reply via email to