And don’t forget a separate config for sitemonitor base version 1 versus 
version 2.

 

From: Af [mailto:af-boun...@afmug.com] On Behalf Of Forrest Christian (List 
Account) via Af
Sent: Saturday, October 25, 2014 3:28 PM
To: af
Subject: Re: [AFMUG] Cacti & SiteMonitor: What did I break?

 

Most people end up with a set of three or four configurations.  Ie sitemonitor 
plus a injector is one configuration,  a sitemonitor by itself is another one.

If you put the modules you don't ever monitor at the end of the list then you 
can reuse configurations. Ie, a sitemonitor and syncinjector is the same as a 
sitemonitor, syncinjector, and Poe as far as monitoring goes.

On Oct 25, 2014 1:06 PM, "Bill Prince via Af" <af@afmug.com> wrote:

OK.  I think I have an approach. The SiteMonitor plus all its expansion units 
is not the "device".

The "device" is the SiteMonitor plus the index of the expansion unit.

For example:

*       SiteMonitor, index 0 is the SiteMonitor device
*       SiteMonitor, index 1 is the 4-port POE device
*       SiteMonitor, index 2 is the SyncInjector (first instance)
*       SiteMonitor, index 3 is the SyncInjector (second instance)

and so on.

So when you add a SiteMonitor, you just add the SiteMonitor. If you add another 
Packetflux expansion unit, you have to add it knowing which index (AKA "slot") 
it is.  Put the device in a different position, and you need to update the 
index.

bp

On 10/25/2014 10:52 AM, Bill Prince via Af wrote:

Yah.  Except that the index moves around, depending on what's in front of it 
(e.g. 4-port POE versus an 8-port POE).  So I can't depend on what index number 
I'll be using at any given installation.  The index name will have to stay 
static if I ever hope to find it.  Then again, if I install two of anything, 
there will be more than one index with the same description. 

Hmmm.  How to do this.   Maybe I do have to give each device a unique 
description, and then teach cacti to index on the unique description?




bp

On 10/25/2014 10:16 AM, Forrest Christian (List Account) via Af wrote:

They should be offset by a fixed amount.  Ie subtract 4

On Oct 25, 2014 10:58 AM, "Bill Prince via Af" <af@afmug.com> wrote:

I think that may be it.  The OID I was using is no longer valid.  So the SNMP 
response that came back had numbers in it, but it also looks like the checksum 
was broken.

Not clear to me why I thought I could do this without doing the index thing.

I hate doing the index thing.




bp

On 10/24/2014 10:32 PM, Forrest Christian (List Account) via Af wrote:

A power cycle and a reboot should be identical in almost every case.  The 
reboot actually triggers a hardware reset internally in the processor, which 
should clear everything out.  Of course as soon as I say that it is identical, 
someone will find an example where it is not.

I'm not where I can look at the trace you sent, but I'm surprised it contains 
errors.  I do know that the unit will return a response which may look like 
this if the oid is invalid.

Did you adjust your oids in cacti after the removal of the mystery expansion 
unit from the table?  If not, this is likely the problem.

In regards to the unit being there grin the factory..  My guess is if you had 
this unit listed in there from the get go, then it probably was the expansion 
unit we use to test the expansion bus here.  It's supposed to be factory reset 
before shipping but it would not shock me if it wasn't.   We actually had a 
short period that a largish percentage went out not factory reset due to a 
tester software issue.   Not really a problem but we hate to have them go out 
in any other state.

On Oct 24, 2014 5:08 PM, "Bill Prince via Af" <af@afmug.com> wrote:

You mean from the web GUI?� Sure.

I presume a power cycle does something different from a reboot?

I was always curious about this particular SiteMonitor, as it came up with the 
extra device on the expansion bus from the get-go.� I'd never worried about 
it, and then I saw the discussion about getting rid of old devices with the 
zeroed-serial trick.

Don't go there!� It's a trap!




bp

On 10/24/2014 2:52 PM, George Skorup (Cyber Broadcasting) via Af wrote:

Can you post a screenshot of your expansion, binary and analog tabs?

Also, I bet if you power-cycle it, it will be fine again. I was working with 
Forrest on a bug where the SyncInjector and some other newer modules would 
mysteriously disappear from the bus. He was able to reproduce and get a fixed 
up firmware load for the modules. Something about one thing booting up faster 
than another, or something like that.

On 10/24/2014 4:41 PM, Bill Prince via Af wrote:

Gotcha!

I removed all the Data Sources except one (PWR1).� Suddenly that data was 
making it into cacti.

Then I added back in all the Data Sources coming _JUST_ from the SiteMonitor 
itself.� That also worked.

Then I added in one of the Data Sources from the SyncInjector (sync events), 
which happens to be the only unit on the expansion bus past where I removed the 
non-existent unit.� This broke it again.

So I have apparently uncovered a bug where removing a unit from the expansion 
bus (by zeroing the serial number) that causes the SiteMonitor to break SNMP 
responses.� I think it's probably just a bad checksum, but I will leave that 
up to him.� I forwarded the pcap trace to him.

I will probably also swap out the SiteMonitor that has the problem.

Thanks guys!




bp

On 10/24/2014 1:57 PM, Bill Prince via Af wrote:

Then again....

Not sure why I didn't notice this the first (or second) time.� Wireshark is 
telling me I have a malformed packet; either a broken header or bad 
checksum.� So even though the SNMP response is coming in with the expected 
data, it's getting dropped before is gets into cacti because of the malformed 
packet.

This would explain why removing a unit on the expansion bus changed things...



bp




On 10/24/2014 1:32 PM, Bill Prince via Af wrote:

OK. Confirmed.� The SiteMonitor is getting the SNMP requests, and it is 
responding with the expected values.

I ran a pcap trace both at the SiteMonitor as well as at the ethernet port on 
the cacti server.� SNMP requests/responses are going both ways (and at both 
ends). In fact, spine appears to be doing 3 retries.

One thing I didn't expect is that just before the SNMP requests, there are two 
attempts to open a telnet on the SiteMonitor.� Not sure where that is coming 
from, except perhaps for the Manage plugin (which I de-installed several weeks 
ago).

So something is broken inside cacti.� How/why this was caused by zeroing a 
serial number from a non-existent expansion unit is completely baffling to me.

I also have no clue how to fix it, because cacti "thinks" there was no response.



bp

On 10/24/2014 11:16 AM, George Skorup (Cyber Broadcasting) via Af wrote:

I am thoroughly confused. Is your community string correct? Can you increase 
the device SNMP timeout, like 1000ms instead of 250ms. What's your device down 
detection set to? Is it showing down in the device list?

I have seen some base units go kinda screwy and respond slower and a reboot 
doesn't fix it, they needed a power-cycle.

On 10/24/2014 11:25 AM, Bill Prince via Af wrote:

Now thrice.

No joy in Mudville.




bp

On 10/24/2014 8:07 AM, Bill Prince via Af wrote:

Yah.� Twice now.




bp

On 10/23/2014 11:06 PM, George Skorup (Cyber Broadcasting) via Af wrote:

Gotta be the poller cache. Did you try a rebuild?

On 10/23/2014 11:03 PM, Bill Prince via Af wrote:

Getting closer.� When I look in the SNMP cache, there is no entry for the 
device.

Looking in the log (without debug), I get:

10/23/2014 08:34:25 PM - SPINE: Poller[0] Host[ 
<http://10.13.112.20/host.php?action=edit&id=797> 797] TH[1] DS[ 
<http://10.13.112.20/data_sources.php?action=ds_edit&id=12316> 12316] WARNING: 
SNMP timeout detected [250 ms], ignoring host '10.13.114.254'

So there is something causing the SNMP request to barf inside cacti.� When I 
do an snmpget from the CLI, it all looks fine.� Likewise, the realtime plugin 
is working fine too.

So when realtime is doing the SNMP queries outside the poller, they are 
fine.� Just when spine is doing the SNMP requests.





bp

On 10/23/2014 4:12 PM, George Skorup (Cyber Broadcasting) via Af wrote:

You divided by zero, didn't you? 

Are you sure your modules are in the same order as before? 

On 10/23/2014 1:29 PM, Bill Prince via Af wrote: 




I noticed an "Expansion Unit" on one of my SiteMonitors this morning.� It 
said something about "Device Removed" or something like that. 

Remembering the discussion the other day on this topic, I put a "0" in the 
Serial # for the non-existent unit, rescanned, & rebooted. 

Now, none of the OIDs work in Cacti.� If I do a simple snmpget on any of the 
OIDs that I use, the correct information comes back. Several of the OIDs are on 
the base unit anyway, so they would not have moved, and further, the OIDs don't 
reference the serial number. 

So... what did I do, and how do I fix it? 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Reply via email to