Re: Temperature monitoring

2017-07-18 Thread Peter Beckman

Agreed -- there are already tons of temp sensors throughout old and new
hardware. I've used SCSI drive queries via sdparm and more recently hddtemp
to get the current temperature of the drives. No need for SNMP or ILO,
though that can give you a more detailed picture where possible.

You first monitor and record for 24 hours to get your baseline temp for a
given rack or server, then set your threshold, then let your monitoring
platform do the rest.

Since I use hosted dedicated servers, I don't want to pay for yet another
device. In monitoring only those disk temps I've caught two cooling issues
before they became a crisis, one of which my hosting provider was not aware
of.

If you control the hardware, or at least have access to it, there should be
enough sensors to let you know at least something is causing a problem.

Beckman

On Thu, 13 Jul 2017, Andrew Latham wrote:


On Thu, Jul 13, 2017 at 9:33 PM, Dovid Bender  wrote:


All,

We had an issue with a DC where temps were elevated. The one bit of
hardware that wasn't watched much was the one that sent out the initial
alert. Looking for recommendations on hardware that I can mount/hang in
each cabinet that is easy to set up and will alert us if temps go beyond a
certain point.

TIA.

Dovid



Most everything has temperature sensors from switches, servers and most
modern PDUs. A dedicated solution is just creating the problem again in the
future. Monitor the temps on everything and gain knowledge related to
failure rates. Most companies with physical infrastructure could pay for
another engineer to discover these unexpected expenses. Also note that
modern air conditioning and refrigeration have SNMP or BACNET protocol
support, just download the manual.

--
- Andrew "lathama" Latham -



---
Peter Beckman  Internet Guy
beck...@angryox.com http://www.angryox.com/
---


RE: Temperature monitoring

2017-07-18 Thread Edwin Pers
+1 for the serverscheck.com gear. Been running it as a humidity monitor in the 
plant for a year or so now and it's been rock solid. If you're the kind of shop 
that requires calibration for that sort of equipment they'll handle that as 
well. Great company to work with. Pair it with Cacti + thold plugin or whatever 
other snmp monitoring you like - or the base units can handle alerting on their 
own.

FYI for those interested - the stated max length of connecting cable between 
the base station and the sensor units (30ft iirc) is way under what it'll do in 
the real world - I've got at least one sensor unit that's a good 500ft away 
from the base station and it's been working just fine

Ed Pers

-Original Message-
From: NANOG [mailto:nanog-boun...@nanog.org] On Behalf Of David Charlebois
Sent: Sunday, July 16, 2017 10:02 PM
To: NANOG 
Subject: Re: Temperature monitoring

we use: https://serverscheck.com/sensors/ - simple setup, graph nicely in 
Cacti. I went with ServerCheck wired based units + external temp+humidity 
probe. The base unit displays the temperature which is a nice quick reference 
if you are in the room.

On Fri, Jul 14, 2017 at 8:31 AM, Dan White  wrote:

> We use Asentria.
>
> On 07/13/17 22:33 -0400, Dovid Bender wrote:
>
>> All,
>>
>> We had an issue with a DC where temps were elevated. The one bit of 
>> hardware that wasn't watched much was the one that sent out the 
>> initial alert. Looking for recommendations on hardware that I can 
>> mount/hang in each cabinet that is easy to set up and will alert us 
>> if temps go beyond a certain point.
>>
>
> --
> Dan White
> BTC Broadband
> Network Admin Lead
> Ph  918.366.0248 (direct)   main: (918)366-8000
> Fax 918.366.6610email: dwh...@olp.net
> http://www.btcbroadband.com
>


Re: Zabbix IT Services feature set

2017-07-18 Thread valdis . kletnieks
On Tue, 18 Jul 2017 14:33:19 -, Graham Johnston said:
> My question is, has anyone gotten the Zabbix IT Services to work correctly?  
> Is there a trick to getting it to work, some configuration we are doing 
> incorrectly?

We're a Zabbix shop, with a large number of boxes being monitored.

This may or may not be your problem, but it bit me big time when were were
first getting it up and running.  There's a "gotcha" with triggers, in that
they may have *TWO* values to provide hysteresis.  So if you have a trigger set
to go off at 25 wombats/second, and your system hits 32 wps, the trigger will
flag a problem.  It will *continue* doing so *not* until it drops below
25 wps, but until it drops down to the "clear" value (for example 10 
wombats/sec).
SO you can be sitting at 11 or 13 or 12 for a long time, but it won't go
to OK until till it's below 10 when Zabbix checks. (A side effect is if it
manages to have a very short dip to 9.8 wps, and back up to 13, you'll be
scratching your head wondering how it went to OK. :)

(And then of course there's the "somebody had a wild hair" cases where the
trigger into trouble state is one hand-coded expression checking one thing, and
the "OK" trigger checks something entirely different. :)

Hope that helps.


pgpGAfoCM5p45.pgp
Description: PGP signature


Dark Fiber Ring - IRU

2017-07-18 Thread Rod Beck
Looking for vendors who can do a ring from Hamilton to Toronto. 15 years. Only 
IRUs. I assume Zayo and Beanstalk can do it. Any other parties?


Roderick Beck

Director of Global Sales

United Cable Company

DRG Undersea Consulting

Affiliate Member

www.unitedcablecompany.com

85 Király utca, 1077 Budapest

rod.b...@unitedcablecompany.com

36-30-859-5144


[1467221477350_image005.png]


Zabbix IT Services feature set

2017-07-18 Thread Graham Johnston
Hi,

We have the Zabbix IT Services (running on Zabbix 3.2) configured for some test 
groups.  It usually returns good data but occasionally it seems that one 
service group or trigger will get stuck in an alerting state and provide an 
incorrect SLA.  This can occur if the trigger has changed to a problem state 
and then back to OK but the IT services doesn't reflect that change.  It will 
occur where the top level group will show as having 100% problem time and the 
sub groups and items either have no problem time or such a small amount that it 
wouldn't indicate 100% problem time.

We have it built with some groups under root, some sub groups and items and the 
items will have a trigger associated with those items.  We followed this 
article to the best of our knowledge: 
https://www.zabbix.com/documentation/3.2/manual/it_services
 
For Example: 
|Data Center 
|-Core1
|--Core1 - ICMP - Trigger
|-Core2
|--Core2 - ICMP - Trigger

Each subitem is a child of the item above it.  We haven't configured any 
dependencies to any other groups or items.
My question is, has anyone gotten the Zabbix IT Services to work correctly?  Is 
there a trick to getting it to work, some configuration we are doing 
incorrectly?

Thanks,
Graham




PCCW contact

2017-07-18 Thread JASON BOTHE

Hey NANOGers, I'm hoping to find a contact for PCCW in Hong Kong to assist with 
a BGP announcement issue. Not having any luck through the front door. 

Thanks!

Jason


Sent from my iPhone