Hello Fellow Users of Intermapper!
I am using the custom-snmp NetApp probe written by Mr. Stewart Harris and would
like to be able to throw an alarm on a specific percentage of disk use.
My understanding is that it will only currently alarm on the following
variables.
<snmp-device-thresholds>
alarm: ($miscGlobalStatus <> 3)&&($miscGlobalStatus <> 4)
"$miscGlobalStatusMessage"
alarm: $diskfailedCount > 0 "$diskFailedMessage"
alarm: $fsOverallStatus > 1 "$fsStatusMessage"
warning: $miscGlobalStatus = 4 "$miscGlobalStatusMessage"
</snmp-device-thresholds>
And an error message such as the one below will be turned from the
$fsOverallStatus variable.
01/19 06:56:14: Message from InterMapper 5.5.6
Event: Alarm
Name: filer02
Document: Storage
Address: 192.168.x.y
Probe Type: SNMP-SHEF- Netapp Status (port 161 SNMPv2c)
Condition: /vol/NA01_NFS03 is nearly full (using or reserving 95% of
space and
0% of inodes, using 95% of reserve).
What I would like to do is to be able to alarm once 70% of the disk is in use.
I'm thinking about adding the following to the <snmp-device-thresholds> section
alarm: $fsMaxUsedBytesPerCent > 70, "Alarm Message"
Would this work in order to alarm me sooner at a predifined alarm percentage? I
looked up the OID for the NetApp fsMaxUsedBytesPerCent and it simply returns an
integer of 0 through 100.
Thanks for your input,
Reza
<!--
Netapp - (shef.ac.uk.snmp.netappstatus)
Custom Probe for InterMapper (http://www.intermapper.com)
Grabs useful status variables about a Netapp filer
assumes that the filer is clustered
SMH 20/10/06 main development
23/10/06 alarms changed to better reflect what happens at takeover (yes
its happened for real)
-->
<header>
"type" = "custom-snmp"
"package" = "shef.ac.uk"
"probe_name" = "snmp.netappstatus"
"human_name" = "SNMP-SHEF- Netapp Status"
"version" = "1.01"
"address_type" = "IP"
"port_number" = "161"
"display_name" = "SNMP-SHEF-Netapp Status"
</header>
<description>
\g0i++\For Netapp Filers (clustered)\p--\
\b++\A. Probed MIB(s)\p--\
\b\Netapp MIB\p\
\b++\B. Displayed Values\p--\
General Info: eg Temperature, cache, Nvram battery, CPU utilisation
Status Summaries: Global, Filesystem, Fans, Power
Cluster Status: During takeover, some info is not returned from node that has
been taken over.
Disk Summary: How many, spares, failed, prefailed
\b++\C. Alarms\p--\
Global Status not OK or non-critical
Disk failure
Some parts of filesystem full or nearly full
\b++\D. Warnings\p--\
Global Status non-critical
\bG3\Bugs or Requests? \p\\iU=mailto:s.har...@sheffield.ac.uk\Please contact
Stewart Harris.\p\
</description>
<snmp-device-variables>
<!-- Product -->
productVersion, 1.3.6.1.4.1.789.1.1.2.0, DEFAULT, "" <!--
productVersion -->
productGuiUrl, 1.3.6.1.4.1.789.1.1.7.0, DEFAULT, "" <!--
productGuiUrl -->
<!-- CPU -->
cpuBusyTimePerCent, 1.3.6.1.4.1.789.1.2.1.3.0, DEFAULT, "CPU Util" <!--
cpuBusyTimePerCent since last request -->
<!-- Misc -->
miscGlobalStatus, 1.3.6.1.4.1.789.1.2.2.4.0, DEFAULT, "" <!--
miscGlobalStatus (1-6, 3 OK)-->
miscCacheAge, 1.3.6.1.4.1.789.1.2.2.23.0, DEFAULT, "" <!--
miscCacheAge mins-->
miscGlobalStatusMessage, 1.3.6.1.4.1.789.1.2.2.25.0, DEFAULT, "" <!--
miscGlobalStatusMessage -->
<!-- Cluster status beware may not be accurate if failover occurred and Ip
addresses move-->
cfSettings, 1.3.6.1.4.1.789.1.2.3.1.0, DEFAULT, "" <!--
cfSettings (1-5) -->
cfState, 1.3.6.1.4.1.789.1.2.3.2.0, DEFAULT, "" <!-- cfState
(1-4) -->
cfCannotTakeOvercause, 1.3.6.1.4.1.789.1.2.3.3.0, DEFAULT, "" <!--
cfCannotTakeOvercause (1-6) -->
cfPartnerStatus, 1.3.6.1.4.1.789.1.2.3.4.0, DEFAULT, "" <!--
cfPartnerStatus (1-3) -->
cfPartnerName, 1.3.6.1.4.1.789.1.2.3.6.0, DEFAULT, "" <!--
cfpartnerName -->
cfInterConnectStatus, 1.3.6.1.4.1.789.1.2.3.8.0, DEFAULT, "" <!--
cfInterConnectStatus (1-4) -->
<!-- Environment variables -->
envOverTemperature, 1.3.6.1.4.1.789.1.2.4.1.0, DEFAULT, "" <!--
envOverTemperature (1-2) -->
envFailedFanCount, 1.3.6.1.4.1.789.1.2.4.2.0, DEFAULT, "" <!--
envFailedFanCount -->
envFailedFanMessage, 1.3.6.1.4.1.789.1.2.4.3.0, DEFAULT, "" <!--
envFailedFanMessage -->
envFailedPowerSupplyCount, 1.3.6.1.4.1.789.1.2.4.4.0, DEFAULT, "" <!--
envFailedPowerSupplyCount -->
envFailedPowerSupplyMessage, 1.3.6.1.4.1.789.1.2.4.5.0, DEFAULT, "" <!--
envFailedPowerSupplyMessage -->
<!-- Nvram -->
nvramBatteryStatus, 1.3.6.1.4.1.789.1.2.5.1.0, DEFAULT, "" <!--
nvramBatteryStatus (1-9) -->
<!-- FileSystem -->
fsOverallStatus, 1.3.6.1.4.1.789.1.5.7.1.0, DEFAULT, "" <!--
fsOverallStatus (1-3) -->
fsStatusMessage, 1.3.6.1.4.1.789.1.5.7.2.0, DEFAULT, "" <!--
fsStatusMessage -->
fsMaxUsedBytesPerCent, 1.3.6.1.4.1.789.1.5.7.3.0, DEFAULT, "" <!--
fsMaxUsedBytesPerCent -->
<!-- Raid -->
diskTotalCount, 1.3.6.1.4.1.789.1.6.4.1.0, DEFAULT, "" <!--
diskTotalCount -->
diskActiveCount, 1.3.6.1.4.1.789.1.6.4.2.0, DEFAULT, "" <!--
diskActiveCount -->
diskReconstructingCount, 1.3.6.1.4.1.789.1.6.4.3.0, DEFAULT, "" <!--
diskReconstructingCount -->
diskReconstructingParityCount, 1.3.6.1.4.1.789.1.6.4.4.0, DEFAULT, "" <!--
diskReconstructingParityCount -->
diskFailedCount, 1.3.6.1.4.1.789.1.6.4.7.0, DEFAULT, "" <!--
diskFailedCount -->
diskSpareCount, 1.3.6.1.4.1.789.1.6.4.8.0, DEFAULT, "" <!--
diskSpareCount -->
diskFailedMessage, 1.3.6.1.4.1.789.1.6.4.10.0, DEFAULT, "" <!--
diskFailedMessage -->
diskPreFailedCount, 1.3.6.1.4.1.789.1.6.4.11.0, DEFAULT, "" <!--
diskPreFailedCount -->
<!-- Code-to-Text Conversions -->
DispenvOverTemperature,
($envOverTemperature=1)?"OK":($envOverTemperature=2)?"Over
Temp":"$envOverTemperature (Unknown)", CALCULATION
DispmiscGlobalStatus,
($miscGlobalStatus=3)?"OK":($miscGlobalStatus=4)?"Non-Critical":($miscGlobalStatus=5)?"Critical":($miscGlobalStatus=6)?"Non-Recoverable":($miscGlobalStatus=2)?"Unknown":($miscGlobalStatus=1)?"Other":"$miscGlobalStatus
(unknown)", CALCULATION
DispcfSettings,
($cfSettings=2)?"Enabled":($cfSettings=3)?"Disabled":($cfSettings=4)?"Takeover
by partner disabled":($cfSettings=5)?"this node dead":($cfSettings=1)?"Not
Configured":"$cfSettings (unknown)", CALCULATION
DispcfState, ($cfState=2)?"Can
takeover":($cfState=3)?"Cannot
takeover":($cfState=4)?"Takeover":($cfState=1)?"Dead":"$cfState (unknown)",
CALCULATION
DispcfPartnerStatus,
($cfPartnerStatus=2)?"OK":($cfPartnerStatus=1)?"Maybe
Down":($cfPartnerStatus=3)?"Dead":"$cfPartnerStatus (unknown)", CALCULATION
DispcfInterConnectStatus,
($cfInterConnectStatus=4)?"Up":($cfInterConnectStatus=3)?"Partial
failure":($cfInterConnectStatus=2)?"Down":($cfInterConnectStatus=1)?"Not
Present":"$cfInterConnectStatus (unknown)", CALCULATION
DispnvramBatteryStatus1,
($nvramBatteryStatus=1)?"OK":($nvramBatteryStatus=2)?"Partially
Discharged":($nvramBatteryStatus=3)?"Fully
discharged":($nvramBatteryStatus=4)?"Not present":"$nvramBatteryStatus
(unknown)", CALCULATION
DispnvramBatteryStatus2, ($nvramBatteryStatus=5)?"Near End of
Life":($nvramBatteryStatus=6)?"At end of
life":($nvramBatteryStatus=7)?"Unknown":($nvramBatteryStatus=8)?"Over
charged":($nvramBatteryStatus=9)?"Fully Charged":"$nvramBatteryStatus
(unknown)", CALCULATION
DispnvramBatteryStatus, ($nvramBatteryStatus
<6)?"$DispnvramBatteryStatus1":"$DispnvramBatteryStatus2", CALCULATION
</snmp-device-variables>
<snmp-device-thresholds>
alarm: ($miscGlobalStatus <> 3)&&($miscGlobalStatus <> 4)
"$miscGlobalStatusMessage"
alarm: $diskfailedCount > 0 "$diskFailedMessage"
alarm: $fsOverallStatus > 1 "$fsStatusMessage"
warning: $miscGlobalStatus = 4 "$miscGlobalStatusMessage"
</snmp-device-thresholds>
<snmp-device-display>
\bM5\General Information\p\
\pM4\ Temp: \p\\bG0\$DispenvOverTemperature\p\
\pM4\ Cache: \p\\bG0\$miscCacheAge mins\p\
\pM4\ NVRAM: \p\\bG0\$DispnvramBatteryStatus\p\
\pM4\ CPU: \p\\bG0\$cpuBusyTimePerCent %\p\
\bM5\Status Summaries\p\
\pM4\ Global: \p\\bG0\$DispmiscGlobalStatus\p\
\pM4\ \p\\bG0\$miscGlobalStatusMessage\p\
\pM4\ Filesystem: \p\\bG0\$fsStatusMessage\p\
\pM4\ Fans: \p\\bG0\$envFailedFanMessage\p\
\pM4\ Power: \p\\bG0\$envFailedPowerSupplyMessage\p\
\bM5\Cluster Status\p\
\pM4\ Settings: \p\\bG0\$DispcfSettings\p\
\pM4\ State: \p\\bG0\$DispcfState\p\
\pM4\ Partner: \p\\bG0\$cfPartnerName\p\ \pM4\ Status:
\p\\bG0\$DispcfPartnerStatus\p\
\pM4\ Interconnect: \p\\bG0\$DispcfInterConnectStatus\p\
\bM5\Disk Summary\p\
\pM4\ Total: \p\\bM0\$diskTotalcount\p\ \pM4\ Spare:
\p\\bM0\$diskSparecount\p\
\pM4\ Failed: \p\\bM0\$diskFailedcount\p\ \pM4\ Prefailed:
\p\\bM0\$diskPreFailedcount\p\
\bG3\Bugs or Requests? \p\\iG3\\U=mailto:s.har...@sheffield.ac.uk\Please
contact Stewart Harris.\p\
</snmp-device-display>