Re: Pinging a Device Every Second

2018-12-28 Thread Dale W. Carder
Thus spake Christian Meutes (christ...@errxtx.net) on Fri, Dec 21, 2018 at 
02:41:23PM +0100:
> Depending on your requirements and scale - but I read you want history -
> it's probably less a demand on CPU or network resources, but more on IOPS.
> 
> If you cache all results before writing to disk, then it's not much of a
> problem, but by just going "let's use RRD/MRTG for this" your IOPS could
> become the first problem. So you might look into a proper timeseries
> backend or use a caching daemon for RRD.

Having once written a caching daemon for mrtg/rrdtool, the advent of SSD
arrays has made iops largely irrelevant.  (I had ~ 1.2M targets in mrtg
on that machine)

Dale
 
 
> On Sat, Dec 15, 2018 at 4:48 PM Colton Conor  wrote:
> 
> > How much compute and network resources does it take for a NMS to:
> >
> > 1. ICMP ping a device every second
> > 2. Record these results.
> > 3. Report an alarm after so many seconds of missed pings.
> >
> > We are looking for a system to in near real-time monitor if an end
> > customers router is up or down. SNMP I assume would be too resource
> > intensive, so ICMP pings seem like the only logical solution.
> >
> > The question is once a second pings too polling on an NMS and a consumer
> > grade router? Does it take much network bandwidth and CPU resources from
> > both the NMS and CPE side?
> >
> > Lets say this is for a 1,000 customer ISP.
> >
> >
> >
> 
> -- 
> Christian Meutes
> 
> e-mail/xmpp: christ...@errxtx.net
> mobile: +49 176 32370305
> PGP Fingerprint: B458 E4D6 7173 A8C4 9C75315B 709C 295B FA53 2318
> Toulouser Allee 21, 40211 Duesseldorf, Germany


Re: Pinging a Device Every Second

2018-12-21 Thread Christian Meutes
Depending on your requirements and scale - but I read you want history -
it's probably less a demand on CPU or network resources, but more on IOPS.

If you cache all results before writing to disk, then it's not much of a
problem, but by just going "let's use RRD/MRTG for this" your IOPS could
become the first problem. So you might look into a proper timeseries
backend or use a caching daemon for RRD.


On Sat, Dec 15, 2018 at 4:48 PM Colton Conor  wrote:

> How much compute and network resources does it take for a NMS to:
>
> 1. ICMP ping a device every second
> 2. Record these results.
> 3. Report an alarm after so many seconds of missed pings.
>
> We are looking for a system to in near real-time monitor if an end
> customers router is up or down. SNMP I assume would be too resource
> intensive, so ICMP pings seem like the only logical solution.
>
> The question is once a second pings too polling on an NMS and a consumer
> grade router? Does it take much network bandwidth and CPU resources from
> both the NMS and CPE side?
>
> Lets say this is for a 1,000 customer ISP.
>
>
>

-- 
Christian Meutes

e-mail/xmpp: christ...@errxtx.net
mobile: +49 176 32370305
PGP Fingerprint: B458 E4D6 7173 A8C4 9C75315B 709C 295B FA53 2318
Toulouser Allee 21, 40211 Duesseldorf, Germany


Re: Pinging a Device Every Second

2018-12-20 Thread Olav Kvittem
Hi,

The link is not the only component to fail - routers and routing protocols all 
contribute at least as much.
If your customers would have redundant connections,
you also would like to look at convergence times.
So a measurement end to end by a probe in the customers network could give
you a more true picture.
Facing that even sub second outages can annoy a video meeting,
it might be that you want to  poll more often than a second.

Realizing that your "internet service" depends on the behaviour of all all the 
other
service providers quality and if you even start monitoring that - you 
understand that
you are "in deep shit" ;-)


I did a small scale global inter domain measurement and discovered that the 
sheer number of small outages is way too high.

Many of them  might be routing changeovers in  multi-redundant networks.

cheers
Olav

On 15.12.2018 18:55, Tim Pozar wrote:
> In one of my client's company, we use LibreNMS. It is normally used > to get 
> SNMP data but we also have it configured to ping our more > "high touch" 
> cients routers. In that case we can record performance > such as latency and 
> packet loss. It will generate graphs that we can > pass on to the client. It 
> also can be set to alert us if a client's > router is not pingable. > > 
> LibreNMS can also integrate Smokeping if you want Smokeping-style > graphs 
> showing standard deviation, etc. > > Currently I am running LibreNMS on a VM 
> on a Proxmox cluser with a > couple of cores. It is probing 385 devices every 
> 5 minutes and > keeping up with that. In polling, SNMP is the real time and 
> CPU hog > where ping is pretty low impact. > > Tim > > On 12/15/18 9:37 AM, 
> Baldur Norddahl wrote: >> You could configure BFD to send out a SNMP alert 
> when three packets >> have been missed on a 50 ms cycle. Or instantly if the 
> interface >> charges state to down. This way you would know that they are 
> down >> within 150 ms. >> >> BFD is the hardware solution. A Linux box that 
> has to ping 1000 >> addresses per second will be very taxed and likely unable 
> to do >> that in a stable way. You will have seconds where it fails to do >> 
> them all followed by seconds where it attempts to do them more than >> once. 
> The result is that the statistics gathered is worthless. If >> you do 
> something like this, it is much better to have a less >> ambitious 1 minute 
> cycle. >> >> Take a look at Smokeping. If you want a graph to show the 
> quality >> of the line, Smokeping makes some very good graphs for that. >> >> 
> Regards Baldur >> >> 15. dec. 2018 16.49 skrev "Colton Conor" 
> > 
> >: >> >> How 
> much compute and network resources does it take for a NMS to: >> >> 1. ICMP 
> ping a device every second 2. Record these results. 3. >> Report an alarm 
> after so many seconds of missed pings. >> >> We are looking for a system to 
> in near real-time monitor if an end >> customers router is up or down. SNMP I 
> assume would be too >> resource intensive, so ICMP pings seem like the only 
> logical >> solution. >> >> The question is once a second pings too polling on 
> an NMS and a >> consumer grade router? Does it take much network bandwidth 
> and CPU >> resources from both the NMS and CPE side? >> >> Lets say this is 
> for a 1,000 customer ISP. >> >> >>



Re: Pinging a Device Every Second

2018-12-16 Thread Richard Holbo
YMMV... but most of the CPE routers I've seen lately have icmp turned
off by default, so you'll be messing with settings in the customer
router.  Do you provide the router? Also agree with Baldur, 2
minutes... is more than likely the customer router rebooting itself or
something like that.   If they support SNMP at ALL uptime is a VERY
useful OID.  I've finally given up an started to provide the customer
CPE.. since we're going to get the blame anyway... might as well be
able to monitor it in a fashion that we can choose and charge another
$10 a month for managed router.

TR-069 has settings to change the update frequency as well and it can
be persuaded to provide SNMPish information.

I also run a smokeping for _special_ customers.  I've found that 20
rapid pings every 1 minute gives me pretty good stats on jitter and if
they really are having an issue, I'll see it at that granularity.

/rh

On Sat, Dec 15, 2018 at 5:22 PM Baldur Norddahl
 wrote:
>
> Hi
>
> Customers do not usually complain about 2 minutes of downtime unless it is a 
> repeating event. We will therefore offer such customers to put their line on 
> monitor mode, which means we will add them to smokeping. You could also start 
> the ping once a second thing, which would be no problem if it is only a few 
> customers on monitor mode.
>
> However 2 minutes of downtime is a symptom of bad wifi more often than the 
> internet connection.
>
> Regards,
>
> Baldur
>
>
> On Sat, Dec 15, 2018 at 7:33 PM Colton Conor  wrote:
>>
>> The problem I am trying to solve is to accurately be able to tell a customer 
>> if their home internet connection was up or down.  Example, customer calls 
>> in and says my internet was down for 2 minutes yesterday. We need to be able 
>> to verify that their internet connection was indeed down. Right now we have 
>> no easy way to do this.  Getting metrics like packet loss and jitter would 
>> be great too, though I realize ICMP data path does not always equal customer 
>> experience as many network device prioritize ICMP traffic. However ICMP 
>> pings over the internet do usually accurately tell if a customers modem is 
>> indeed online or not.
>>
>> Most devices out in the field like ONT's and DSL modems do not support SNMP 
>> but rather use TR-069 for management. Most of these devices only check into 
>> the TR-069 ACS server once a day.
>> If the consumer device does support SNMP, they usually have weak broadcom or 
>> qualcom SoC processors, outdated linux kernel embedded operating systems, 
>> limited ram, and storage. Most of these can't handle SNMP walks every minute 
>> let alone every 5. We are talking about sub $100 routers here not Juniper, 
>> Cisco, Arista, etc.
>>
>> Most all of these consumer devices are connected to an carrier aggregation 
>> device like a DSLAM, OLT, ethernet switch, or wireless access point. These 
>> access devices do support SNMP, but most manufactures recommend only 5 
>> minute SNMP poling, so a 2 minute outage would not easily be detected. Plus 
>> its hard to correlate that consumer X is on port Y on access switch, and get 
>> that right for a tier 1 CSR.
>>
>> The only two ways I think I can accomplish this is:
>> 1. ICMP pings to a device every so many seconds. Almost every device 
>> supports responding to WAN ICMP pings.
>> or
>> 2. IPFIX sampling at core router, and then drilling down by customer IP. I 
>> think this will tell me if any data was flowing to this customers IP on a 
>> second by second basis, but won't necessarily give us an up or down 
>> indicator. Requires nothing from the consumer's router.
>>
>>
>>
>>
>>
>> On Sat, Dec 15, 2018 at 10:51 AM Stephen Satchell  wrote:
>>>
>>> On 12/15/18 7:48 AM, Colton Conor wrote:
>>> > How much compute and network resources does it take for a NMS to:
>>> >
>>> > 1. ICMP ping a device every second
>>> > 2. Record these results.
>>> > 3. Report an alarm after so many seconds of missed pings.
>>> >
>>> > We are looking for a system to in near real-time monitor if an end
>>> > customers router is up or down. SNMP I assume would be too resource
>>> > intensive, so ICMP pings seem like the only logical solution.
>>> >
>>> > The question is once a second pings too polling on an NMS and a consumer
>>> > grade router? Does it take much network bandwidth and CPU resources from
>>> > both the NMS and CPE side?
>>> >
>>> > Lets say this is for a 1,000 customer ISP.
>>>
>>> What problem are you trying to solve, exactly?  That more than anything
>>> will dictate what you do.
>>>
>>> Short answer: about 1500 bits of bandwidth, and the CPU loading on the
>>> remote device is almost invisible.  Remember the only real difference
>>> between ping and SNMP monitoring (UDP) is the organization of the bits
>>> in the packet and the protocol number in the IP header.  It's still one
>>> packet pair exchanged, unless you get really ambitious with your SNMP
>>> OID list.
>>>
>>> When I was in a medium-sized hosting company, I developed an 

Re: Pinging a Device Every Second

2018-12-16 Thread Saku Ytti
On Sun, 16 Dec 2018 at 17:59, Stephen Satchell  wrote:

> A standard ping packet, with no IP options or additional payload, is 64
> bytes or 512 bits.  If an application wants to make an accurate
> round-trip-delay measurement, it can insert the output of a microsecond
> clock, and compare that value for when the answer packet comes back.
> Add at least 32 bits, perhaps 64.

Unsure about standard, but Linux iputils ping does this:

╰─ ping -4 ftp.funet.fi
PING ftp.funet.fi (193.166.3.2) 56(84) bytes of data.
64 bytes from ftp.funet.fi (193.166.3.2): icmp_seq=1 ttl=243 time=47.8 ms

This means:

20B IPv4 Header
08B ICMP Header
56B ICMP data
--
84B IPv4 packet

Add to that EthernetII encapsulation, and you have:  122B or 976bits
or 976kbps for 1k hosts.

Vast majority of that ICMP data is unnecessary trash, if you use
minimum size EthernetII payload you have 18bytes ICMP data, _free of
charge_, which is plenty to add timestamping and what have you,
without increasing link utilisation. i.e. 672kbps for 1k hosts will
allow you to send 18B of arbitrary data, and there is no way to use
less bps.

> I can see a network operator with a complex mesh network wanting to turn
> on Record Route (RFC791), which adds 24+(hops*32, max 536) bits to both
> ping and ping-response packets.

Be careful about what you intend to measure. I would try to measure
customer experience as much as possible, IP options are punted for
software processing and may forward differently and will forward with
several orders of magnitude higher jitter and will experience larger
packet loss.

-- 
  ++ytti


Re: Pinging a Device Every Second

2018-12-16 Thread Stephen Satchell
On 12/16/18 12:07 AM, Saku Ytti wrote:
> On Sun, 16 Dec 2018 at 00:48, Stephen Satchell  wrote:
> 
>> The 1500 bits are for each ping.  So 1000 hosts would be 1,500,000 bits
> 
> Why? Why did you choose 1500b(it) ping, instead of minimum size or
> 1500B(ytes) IP packets?
> 
> Minimum: 672kbps
> 1500B: 12.16Mbps

I was going from memory, and it is by no means perfect.  But...

A standard ping packet, with no IP options or additional payload, is 64
bytes or 512 bits.  If an application wants to make an accurate
round-trip-delay measurement, it can insert the output of a microsecond
clock, and compare that value for when the answer packet comes back.
Add at least 32 bits, perhaps 64.

Even with this sensible amount of extra ping payload, there is still
plenty of "bandwidth allocation" available to account for
encapsulations:  IPIP, VPN, MPLS, Ethernet framing, ATM framing, 

I can see a network operator with a complex mesh network wanting to turn
on Record Route (RFC791), which adds 24+(hops*32, max 536) bits to both
ping and ping-response packets.

So my 1500 bits for ping was not bad Tennessee windage for the
application described by the original poster, plus comments added by
others.  In fact, it would overestimate the bandwidth required, but not
by that much.

As for how much the use of ping would affect the CPU loading of the
device, that would depend a great deal on the implementation of the
TCP/IP stack in the CPE.  When I wrote _Linux IP Stacks Commentary_, the
code to implement ping is packet receipt, a very small block of code to
build the reply packet, and packet send the above mentioned 64 bytes of
packet.

Consider another service:  Network Time Protocol.  Unlike ping, there is
quite a bit of CPU load to process the time information through the
smoothing filters.  (Counter argument: a properly implemented version of
NTP will send time requests with separations of 60-1024 seconds.)



Re: Pinging a Device Every Second

2018-12-16 Thread Saku Ytti
On Sun, 16 Dec 2018 at 00:48, Stephen Satchell  wrote:

> The 1500 bits are for each ping.  So 1000 hosts would be 1,500,000 bits

Why? Why did you choose 1500b(it) ping, instead of minimum size or
1500B(ytes) IP packets?

Minimum: 672kbps
1500B: 12.16Mbps

-- 
  ++ytti


Re: Pinging a Device Every Second

2018-12-15 Thread Baldur Norddahl
Hi

Customers do not usually complain about 2 minutes of downtime unless it is
a repeating event. We will therefore offer such customers to put their line
on monitor mode, which means we will add them to smokeping. You could also
start the ping once a second thing, which would be no problem if it is only
a few customers on monitor mode.

However 2 minutes of downtime is a symptom of bad wifi more often than the
internet connection.

Regards,

Baldur


On Sat, Dec 15, 2018 at 7:33 PM Colton Conor  wrote:

> The problem I am trying to solve is to accurately be able to tell a
> customer if their home internet connection was up or down.  Example,
> customer calls in and says my internet was down for 2 minutes yesterday. We
> need to be able to verify that their internet connection was indeed down.
> Right now we have no easy way to do this.  Getting metrics like packet loss
> and jitter would be great too, though I realize ICMP data path does not
> always equal customer experience as many network device prioritize ICMP
> traffic. However ICMP pings over the internet do usually accurately tell if
> a customers modem is indeed online or not.
>
> Most devices out in the field like ONT's and DSL modems do not support
> SNMP but rather use TR-069 for management. Most of these devices only check
> into the TR-069 ACS server once a day.
> If the consumer device does support SNMP, they usually have weak broadcom
> or qualcom SoC processors, outdated linux kernel embedded operating
> systems, limited ram, and storage. Most of these can't handle SNMP walks
> every minute let alone every 5. We are talking about sub $100 routers here
> not Juniper, Cisco, Arista, etc.
>
> Most all of these consumer devices are connected to an carrier aggregation
> device like a DSLAM, OLT, ethernet switch, or wireless access point. These
> access devices do support SNMP, but most manufactures recommend only 5
> minute SNMP poling, so a 2 minute outage would not easily be detected. Plus
> its hard to correlate that consumer X is on port Y on access switch, and
> get that right for a tier 1 CSR.
>
> The only two ways I think I can accomplish this is:
> 1. ICMP pings to a device every so many seconds. Almost every device
> supports responding to WAN ICMP pings.
> or
> 2. IPFIX sampling at core router, and then drilling down by customer IP. I
> think this will tell me if any data was flowing to this customers IP on a
> second by second basis, but won't necessarily give us an up or down
> indicator. Requires nothing from the consumer's router.
>
>
>
>
>
> On Sat, Dec 15, 2018 at 10:51 AM Stephen Satchell 
> wrote:
>
>> On 12/15/18 7:48 AM, Colton Conor wrote:
>> > How much compute and network resources does it take for a NMS to:
>> >
>> > 1. ICMP ping a device every second
>> > 2. Record these results.
>> > 3. Report an alarm after so many seconds of missed pings.
>> >
>> > We are looking for a system to in near real-time monitor if an end
>> > customers router is up or down. SNMP I assume would be too resource
>> > intensive, so ICMP pings seem like the only logical solution.
>> >
>> > The question is once a second pings too polling on an NMS and a consumer
>> > grade router? Does it take much network bandwidth and CPU resources from
>> > both the NMS and CPE side?
>> >
>> > Lets say this is for a 1,000 customer ISP.
>>
>> What problem are you trying to solve, exactly?  That more than anything
>> will dictate what you do.
>>
>> Short answer: about 1500 bits of bandwidth, and the CPU loading on the
>> remote device is almost invisible.  Remember the only real difference
>> between ping and SNMP monitoring (UDP) is the organization of the bits
>> in the packet and the protocol number in the IP header.  It's still one
>> packet pair exchanged, unless you get really ambitious with your SNMP
>> OID list.
>>
>> When I was in a medium-sized hosting company, I developed an SNMP-based
>> monitoring system that would query a number of load parameters (CPU,
>> disk, network, overall) on a once a minute schedule, and would keep
>> history for hours on the monitoring server.  The boss fretted about the
>> load such monitoring would impose.  He never saw any.
>>
>> For pure link monitoring, which is what I'm hearing you want to do, in
>> my experience I found that a six-second ping cycle gives lots of early
>> warning for link failures.  Again, it depends on the specifications and
>> detection targets.
>>
>> Some things to consider:
>>
>> 1.  Router restarts take a while.  Consumer-grade routers can take a
>> minute or more to complete a restart to the point where it will respond
>> to ping.  Carrier-grade routers are more variable but in general have so
>> many options built into them that it takes longer to complete a restart
>> cycle.  Since you are talking consumer-grade gear, you probably don't
>> want to be sensitive to CP power sags.
>>
>> 2.  Depending on the technology used on the link, you may get some
>> short-term outages, on the order of 

Re: Pinging a Device Every Second

2018-12-15 Thread Stephen Satchell
On 12/15/18 12:03 PM, Saku Ytti wrote:
> On Sat, 15 Dec 2018 at 18:52, Stephen Satchell  wrote:
> 
>> Short answer: about 1500 bits of bandwidth, and the CPU loading on the
> 
> I can't parse this.
> 
> 1000 hosts at 1 pps would be 672kbps on ethernetII encapulation with
> minimum size frames.
> 

The 1500 bits are for each ping.  So 1000 hosts would be 1,500,000 bits
per ping cycle at the monitor server, but not on each leaf router.  The
designer would need to analyze the network topology to see if there are
any possible choke points.  In a cable internet system, 1000 customers
on a single up/down channel pair would require 700 kilobits each way per
ping cycle.  Yes, this is payload bandwidth, it doesn't include packet
overhead.


Re: Pinging a Device Every Second

2018-12-15 Thread valdis . kletnieks
On Sat, 15 Dec 2018 12:20:01 -0700, Raymond Burkholder said:
> Another aspect is congestion.  Large uploads or downloads can cause 
> packet loss (including dropping the pings with which you are testing).  
> Therefore management packets such as these could be marked and 
> processed, on your side at least, with a higher priority.

How much depends on whether the CPE gear has software recent enough
to avoid massive bufferbloat.



Re: Pinging a Device Every Second

2018-12-15 Thread Keith Stokes
I have a Nagios installation running on a PIII with maybe 512 MB of RAM.

I ping a couple hundred devices 5 times per minute and have an alarm threshold 
of no response for 3 minutes which sends an e-mail.

The same device also checks about 900 services among those 200 devices mostly 
every minute with some every 15 - 60 minutes.

This machine happens to be on a backup measured circuit with one other small 
service.

ISP measures my 90% bandwidth rate at < 20K for years. That includes the 
monitoring, the other low usage service, multiple machines hitting the web 
interface to check status and the outbound e-mails.

--

Keith Stokes
SalonBiz, Inc


On Dec 15, 2018, at 12:33 PM, Colton Conor 
mailto:colton.co...@gmail.com>> wrote:

CAUTION EXTERNAL EMAIL
The problem I am trying to solve is to accurately be able to tell a customer if 
their home internet connection was up or down.  Example, customer calls in and 
says my internet was down for 2 minutes yesterday. We need to be able to verify 
that their internet connection was indeed down. Right now we have no easy way 
to do this.  Getting metrics like packet loss and jitter would be great too, 
though I realize ICMP data path does not always equal customer experience as 
many network device prioritize ICMP traffic. However ICMP pings over the 
internet do usually accurately tell if a customers modem is indeed online or 
not.

Most devices out in the field like ONT's and DSL modems do not support SNMP but 
rather use TR-069 for management. Most of these devices only check into the 
TR-069 ACS server once a day.
If the consumer device does support SNMP, they usually have weak broadcom or 
qualcom SoC processors, outdated linux kernel embedded operating systems, 
limited ram, and storage. Most of these can't handle SNMP walks every minute 
let alone every 5. We are talking about sub $100 routers here not Juniper, 
Cisco, Arista, etc.

Most all of these consumer devices are connected to an carrier aggregation 
device like a DSLAM, OLT, ethernet switch, or wireless access point. These 
access devices do support SNMP, but most manufactures recommend only 5 minute 
SNMP poling, so a 2 minute outage would not easily be detected. Plus its hard 
to correlate that consumer X is on port Y on access switch, and get that right 
for a tier 1 CSR.

The only two ways I think I can accomplish this is:
1. ICMP pings to a device every so many seconds. Almost every device supports 
responding to WAN ICMP pings.
or
2. IPFIX sampling at core router, and then drilling down by customer IP. I 
think this will tell me if any data was flowing to this customers IP on a 
second by second basis, but won't necessarily give us an up or down indicator. 
Requires nothing from the consumer's router.





On Sat, Dec 15, 2018 at 10:51 AM Stephen Satchell 
mailto:l...@satchell.net>> wrote:
On 12/15/18 7:48 AM, Colton Conor wrote:
> How much compute and network resources does it take for a NMS to:
>
> 1. ICMP ping a device every second
> 2. Record these results.
> 3. Report an alarm after so many seconds of missed pings.
>
> We are looking for a system to in near real-time monitor if an end
> customers router is up or down. SNMP I assume would be too resource
> intensive, so ICMP pings seem like the only logical solution.
>
> The question is once a second pings too polling on an NMS and a consumer
> grade router? Does it take much network bandwidth and CPU resources from
> both the NMS and CPE side?
>
> Lets say this is for a 1,000 customer ISP.

What problem are you trying to solve, exactly?  That more than anything
will dictate what you do.

Short answer: about 1500 bits of bandwidth, and the CPU loading on the
remote device is almost invisible.  Remember the only real difference
between ping and SNMP monitoring (UDP) is the organization of the bits
in the packet and the protocol number in the IP header.  It's still one
packet pair exchanged, unless you get really ambitious with your SNMP
OID list.

When I was in a medium-sized hosting company, I developed an SNMP-based
monitoring system that would query a number of load parameters (CPU,
disk, network, overall) on a once a minute schedule, and would keep
history for hours on the monitoring server.  The boss fretted about the
load such monitoring would impose.  He never saw any.

For pure link monitoring, which is what I'm hearing you want to do, in
my experience I found that a six-second ping cycle gives lots of early
warning for link failures.  Again, it depends on the specifications and
detection targets.

Some things to consider:

1.  Router restarts take a while.  Consumer-grade routers can take a
minute or more to complete a restart to the point where it will respond
to ping.  Carrier-grade routers are more variable but in general have so
many options built into them that it takes longer to complete a restart
cycle.  Since you are talking consumer-grade gear, you probably don't
want to be sensitive to CP power sags.

2.  Depending on 

Re: Pinging a Device Every Second

2018-12-15 Thread Saku Ytti
On Sat, 15 Dec 2018 at 18:52, Stephen Satchell  wrote:

> Short answer: about 1500 bits of bandwidth, and the CPU loading on the

I can't parse this.

1000 hosts at 1 pps would be 672kbps on ethernetII encapulation with
minimum size frames.
-- 
  ++ytti


Re: Pinging a Device Every Second

2018-12-15 Thread Raymond Burkholder

On 2018-12-15 11:32 a.m., Colton Conor wrote:
The problem I am trying to solve is to accurately be able to tell a 
customer if their home internet connection was up or down.  Example, 
customer calls in and says my internet was down for 2 minutes 
yesterday. We need to be able to verify that their internet connection 
was indeed down. Right now we have no easy way to do this. Getting 
metrics like packet loss and jitter would be great too, though I 
realize ICMP data path does not always equal customer experience as 
many network device prioritize ICMP traffic. However ICMP pings over 
the internet do usually accurately tell if a customers modem is indeed 
online or not.


I've found that this is a multi-faceted problem.

Looking at pings or smokeping is part of the solution, but may cause 
false negatives themselves, when considering the next point --


Another aspect is congestion.  Large uploads or downloads can cause 
packet loss (including dropping the pings with which you are testing).  
Therefore management packets such as these could be marked and 
processed, on your side at least, with a higher priority.


Someone else mentioned radius (or similar authentication/authorization 
logging mechanism), which will provide an answer if the session did in 
fact drop or not, for those types of connections.


DHCP address changeovers can cause outages.

It has also been common to get the 'internet is out' call when DNS is 
unavailable for whatever reason.  With out name resolution, most eyeball 
functions will fail.


Raymond.


Re: Pinging a Device Every Second

2018-12-15 Thread Dave Bell
Is RADIUS accounting an option here?

Dave

On Sat, 15 Dec 2018 at 18:32, Colton Conor  wrote:

> The problem I am trying to solve is to accurately be able to tell a
> customer if their home internet connection was up or down.  Example,
> customer calls in and says my internet was down for 2 minutes yesterday. We
> need to be able to verify that their internet connection was indeed down.
> Right now we have no easy way to do this.  Getting metrics like packet loss
> and jitter would be great too, though I realize ICMP data path does not
> always equal customer experience as many network device prioritize ICMP
> traffic. However ICMP pings over the internet do usually accurately tell if
> a customers modem is indeed online or not.
>
> Most devices out in the field like ONT's and DSL modems do not support
> SNMP but rather use TR-069 for management. Most of these devices only check
> into the TR-069 ACS server once a day.
> If the consumer device does support SNMP, they usually have weak broadcom
> or qualcom SoC processors, outdated linux kernel embedded operating
> systems, limited ram, and storage. Most of these can't handle SNMP walks
> every minute let alone every 5. We are talking about sub $100 routers here
> not Juniper, Cisco, Arista, etc.
>
> Most all of these consumer devices are connected to an carrier aggregation
> device like a DSLAM, OLT, ethernet switch, or wireless access point. These
> access devices do support SNMP, but most manufactures recommend only 5
> minute SNMP poling, so a 2 minute outage would not easily be detected. Plus
> its hard to correlate that consumer X is on port Y on access switch, and
> get that right for a tier 1 CSR.
>
> The only two ways I think I can accomplish this is:
> 1. ICMP pings to a device every so many seconds. Almost every device
> supports responding to WAN ICMP pings.
> or
> 2. IPFIX sampling at core router, and then drilling down by customer IP. I
> think this will tell me if any data was flowing to this customers IP on a
> second by second basis, but won't necessarily give us an up or down
> indicator. Requires nothing from the consumer's router.
>
>
>
>
>
> On Sat, Dec 15, 2018 at 10:51 AM Stephen Satchell 
> wrote:
>
>> On 12/15/18 7:48 AM, Colton Conor wrote:
>> > How much compute and network resources does it take for a NMS to:
>> >
>> > 1. ICMP ping a device every second
>> > 2. Record these results.
>> > 3. Report an alarm after so many seconds of missed pings.
>> >
>> > We are looking for a system to in near real-time monitor if an end
>> > customers router is up or down. SNMP I assume would be too resource
>> > intensive, so ICMP pings seem like the only logical solution.
>> >
>> > The question is once a second pings too polling on an NMS and a consumer
>> > grade router? Does it take much network bandwidth and CPU resources from
>> > both the NMS and CPE side?
>> >
>> > Lets say this is for a 1,000 customer ISP.
>>
>> What problem are you trying to solve, exactly?  That more than anything
>> will dictate what you do.
>>
>> Short answer: about 1500 bits of bandwidth, and the CPU loading on the
>> remote device is almost invisible.  Remember the only real difference
>> between ping and SNMP monitoring (UDP) is the organization of the bits
>> in the packet and the protocol number in the IP header.  It's still one
>> packet pair exchanged, unless you get really ambitious with your SNMP
>> OID list.
>>
>> When I was in a medium-sized hosting company, I developed an SNMP-based
>> monitoring system that would query a number of load parameters (CPU,
>> disk, network, overall) on a once a minute schedule, and would keep
>> history for hours on the monitoring server.  The boss fretted about the
>> load such monitoring would impose.  He never saw any.
>>
>> For pure link monitoring, which is what I'm hearing you want to do, in
>> my experience I found that a six-second ping cycle gives lots of early
>> warning for link failures.  Again, it depends on the specifications and
>> detection targets.
>>
>> Some things to consider:
>>
>> 1.  Router restarts take a while.  Consumer-grade routers can take a
>> minute or more to complete a restart to the point where it will respond
>> to ping.  Carrier-grade routers are more variable but in general have so
>> many options built into them that it takes longer to complete a restart
>> cycle.  Since you are talking consumer-grade gear, you probably don't
>> want to be sensitive to CP power sags.
>>
>> 2.  Depending on the technology used on the link, you may get some
>> short-term outages, on the order of seconds, so doing "rapid" pings do
>> nothing for you.  During my DSL time, ATM would drop out for short
>> intervals -- so watch out for nuisance trips.
>>
>> 3.  Some routers implement ping limiting, so you have to balance your
>> monitoring sample rate against DoS susceptibility. Offhand, I don't know
>> the granularity of consumer router ping limiting, as I've never had that
>> question pop up.
>>
>> 4.  How 

Re: Pinging a Device Every Second

2018-12-15 Thread Aaron1
I think the guys in the NOC will add a customer CPE to Solarwinds monitoring 
and just have it continually run pings, and set up an alert so that we know as 
soon as the ping stop the alerts go to email or whererver

Aaron

> On Dec 15, 2018, at 12:32 PM, Colton Conor  wrote:
> 
> The problem I am trying to solve is to accurately be able to tell a customer 
> if their home internet connection was up or down.  Example, customer calls in 
> and says my internet was down for 2 minutes yesterday. We need to be able to 
> verify that their internet connection was indeed down. Right now we have no 
> easy way to do this.  Getting metrics like packet loss and jitter would be 
> great too, though I realize ICMP data path does not always equal customer 
> experience as many network device prioritize ICMP traffic. However ICMP pings 
> over the internet do usually accurately tell if a customers modem is indeed 
> online or not.  
> 
> Most devices out in the field like ONT's and DSL modems do not support SNMP 
> but rather use TR-069 for management. Most of these devices only check into 
> the TR-069 ACS server once a day. 
> If the consumer device does support SNMP, they usually have weak broadcom or 
> qualcom SoC processors, outdated linux kernel embedded operating systems, 
> limited ram, and storage. Most of these can't handle SNMP walks every minute 
> let alone every 5. We are talking about sub $100 routers here not Juniper, 
> Cisco, Arista, etc. 
> 
> Most all of these consumer devices are connected to an carrier aggregation 
> device like a DSLAM, OLT, ethernet switch, or wireless access point. These 
> access devices do support SNMP, but most manufactures recommend only 5 minute 
> SNMP poling, so a 2 minute outage would not easily be detected. Plus its hard 
> to correlate that consumer X is on port Y on access switch, and get that 
> right for a tier 1 CSR. 
> 
> The only two ways I think I can accomplish this is:
> 1. ICMP pings to a device every so many seconds. Almost every device supports 
> responding to WAN ICMP pings. 
> or 
> 2. IPFIX sampling at core router, and then drilling down by customer IP. I 
> think this will tell me if any data was flowing to this customers IP on a 
> second by second basis, but won't necessarily give us an up or down 
> indicator. Requires nothing from the consumer's router. 
> 
> 
> 
> 
> 
>> On Sat, Dec 15, 2018 at 10:51 AM Stephen Satchell  wrote:
>> On 12/15/18 7:48 AM, Colton Conor wrote:
>> > How much compute and network resources does it take for a NMS to:
>> > 
>> > 1. ICMP ping a device every second
>> > 2. Record these results.
>> > 3. Report an alarm after so many seconds of missed pings.
>> > 
>> > We are looking for a system to in near real-time monitor if an end
>> > customers router is up or down. SNMP I assume would be too resource
>> > intensive, so ICMP pings seem like the only logical solution.
>> > 
>> > The question is once a second pings too polling on an NMS and a consumer
>> > grade router? Does it take much network bandwidth and CPU resources from
>> > both the NMS and CPE side?
>> > 
>> > Lets say this is for a 1,000 customer ISP.
>> 
>> What problem are you trying to solve, exactly?  That more than anything
>> will dictate what you do.
>> 
>> Short answer: about 1500 bits of bandwidth, and the CPU loading on the
>> remote device is almost invisible.  Remember the only real difference
>> between ping and SNMP monitoring (UDP) is the organization of the bits
>> in the packet and the protocol number in the IP header.  It's still one
>> packet pair exchanged, unless you get really ambitious with your SNMP
>> OID list.
>> 
>> When I was in a medium-sized hosting company, I developed an SNMP-based
>> monitoring system that would query a number of load parameters (CPU,
>> disk, network, overall) on a once a minute schedule, and would keep
>> history for hours on the monitoring server.  The boss fretted about the
>> load such monitoring would impose.  He never saw any.
>> 
>> For pure link monitoring, which is what I'm hearing you want to do, in
>> my experience I found that a six-second ping cycle gives lots of early
>> warning for link failures.  Again, it depends on the specifications and
>> detection targets.
>> 
>> Some things to consider:
>> 
>> 1.  Router restarts take a while.  Consumer-grade routers can take a
>> minute or more to complete a restart to the point where it will respond
>> to ping.  Carrier-grade routers are more variable but in general have so
>> many options built into them that it takes longer to complete a restart
>> cycle.  Since you are talking consumer-grade gear, you probably don't
>> want to be sensitive to CP power sags.
>> 
>> 2.  Depending on the technology used on the link, you may get some
>> short-term outages, on the order of seconds, so doing "rapid" pings do
>> nothing for you.  During my DSL time, ATM would drop out for short
>> intervals -- so watch out for nuisance trips.
>> 
>> 3.  Some routers implement 

Re: Pinging a Device Every Second

2018-12-15 Thread Colton Conor
The problem I am trying to solve is to accurately be able to tell a
customer if their home internet connection was up or down.  Example,
customer calls in and says my internet was down for 2 minutes yesterday. We
need to be able to verify that their internet connection was indeed down.
Right now we have no easy way to do this.  Getting metrics like packet loss
and jitter would be great too, though I realize ICMP data path does not
always equal customer experience as many network device prioritize ICMP
traffic. However ICMP pings over the internet do usually accurately tell if
a customers modem is indeed online or not.

Most devices out in the field like ONT's and DSL modems do not support SNMP
but rather use TR-069 for management. Most of these devices only check into
the TR-069 ACS server once a day.
If the consumer device does support SNMP, they usually have weak broadcom
or qualcom SoC processors, outdated linux kernel embedded operating
systems, limited ram, and storage. Most of these can't handle SNMP walks
every minute let alone every 5. We are talking about sub $100 routers here
not Juniper, Cisco, Arista, etc.

Most all of these consumer devices are connected to an carrier aggregation
device like a DSLAM, OLT, ethernet switch, or wireless access point. These
access devices do support SNMP, but most manufactures recommend only 5
minute SNMP poling, so a 2 minute outage would not easily be detected. Plus
its hard to correlate that consumer X is on port Y on access switch, and
get that right for a tier 1 CSR.

The only two ways I think I can accomplish this is:
1. ICMP pings to a device every so many seconds. Almost every device
supports responding to WAN ICMP pings.
or
2. IPFIX sampling at core router, and then drilling down by customer IP. I
think this will tell me if any data was flowing to this customers IP on a
second by second basis, but won't necessarily give us an up or down
indicator. Requires nothing from the consumer's router.





On Sat, Dec 15, 2018 at 10:51 AM Stephen Satchell  wrote:

> On 12/15/18 7:48 AM, Colton Conor wrote:
> > How much compute and network resources does it take for a NMS to:
> >
> > 1. ICMP ping a device every second
> > 2. Record these results.
> > 3. Report an alarm after so many seconds of missed pings.
> >
> > We are looking for a system to in near real-time monitor if an end
> > customers router is up or down. SNMP I assume would be too resource
> > intensive, so ICMP pings seem like the only logical solution.
> >
> > The question is once a second pings too polling on an NMS and a consumer
> > grade router? Does it take much network bandwidth and CPU resources from
> > both the NMS and CPE side?
> >
> > Lets say this is for a 1,000 customer ISP.
>
> What problem are you trying to solve, exactly?  That more than anything
> will dictate what you do.
>
> Short answer: about 1500 bits of bandwidth, and the CPU loading on the
> remote device is almost invisible.  Remember the only real difference
> between ping and SNMP monitoring (UDP) is the organization of the bits
> in the packet and the protocol number in the IP header.  It's still one
> packet pair exchanged, unless you get really ambitious with your SNMP
> OID list.
>
> When I was in a medium-sized hosting company, I developed an SNMP-based
> monitoring system that would query a number of load parameters (CPU,
> disk, network, overall) on a once a minute schedule, and would keep
> history for hours on the monitoring server.  The boss fretted about the
> load such monitoring would impose.  He never saw any.
>
> For pure link monitoring, which is what I'm hearing you want to do, in
> my experience I found that a six-second ping cycle gives lots of early
> warning for link failures.  Again, it depends on the specifications and
> detection targets.
>
> Some things to consider:
>
> 1.  Router restarts take a while.  Consumer-grade routers can take a
> minute or more to complete a restart to the point where it will respond
> to ping.  Carrier-grade routers are more variable but in general have so
> many options built into them that it takes longer to complete a restart
> cycle.  Since you are talking consumer-grade gear, you probably don't
> want to be sensitive to CP power sags.
>
> 2.  Depending on the technology used on the link, you may get some
> short-term outages, on the order of seconds, so doing "rapid" pings do
> nothing for you.  During my DSL time, ATM would drop out for short
> intervals -- so watch out for nuisance trips.
>
> 3.  Some routers implement ping limiting, so you have to balance your
> monitoring sample rate against DoS susceptibility. Offhand, I don't know
> the granularity of consumer router ping limiting, as I've never had that
> question pop up.
>
> 4.  How large a monitoring server are you willing to devote to such a
> system?  My web host monitoring used a 400-MHz Pentium II box, and it
> didn't even breathe hard.  (A 1U Cobalt box, repurposed with Red Had
> Linux, pulled from a 

Re: Pinging a Device Every Second

2018-12-15 Thread Tim Pozar
In one of my client's company, we use LibreNMS.  It is normally used to
get SNMP data but we also have it configured to ping our more "high
touch" cients routers.  In that case we can record performance such as
latency and packet loss.  It will generate graphs that we can pass on to
the client.  It also can be set to alert us if a client's router is not
pingable.

LibreNMS can also integrate Smokeping if you want Smokeping-style graphs
showing standard deviation, etc.

Currently I am running LibreNMS on a VM on a Proxmox cluser with a
couple of cores.  It is probing 385 devices every 5 minutes and keeping
up with that.  In polling, SNMP is the real time and CPU hog where ping
is pretty low impact.

Tim

On 12/15/18 9:37 AM, Baldur Norddahl wrote:
> You could configure BFD to send out a SNMP alert when three packets have
> been missed on a 50 ms cycle. Or instantly if the interface charges
> state to down. This way you would know that they are down within 150 ms.
> 
> BFD is the hardware solution. A Linux box that has to ping 1000
> addresses per second will be very taxed and likely unable to do that in
> a stable way. You will have seconds where it fails to do them all
> followed by seconds where it attempts to do them more than once. The
> result is that the statistics gathered is worthless. If you do something
> like this, it is much better to have a less ambitious 1 minute cycle.
> 
> Take a look at Smokeping. If you want a graph to show the quality of the
> line, Smokeping makes some very good graphs for that. 
> 
> Regards 
> Baldur 
> 
> 15. dec. 2018 16.49 skrev "Colton Conor"  >:
> 
> How much compute and network resources does it take for a NMS to:
> 
> 1. ICMP ping a device every second
> 2. Record these results.
> 3. Report an alarm after so many seconds of missed pings. 
> 
> We are looking for a system to in near real-time monitor if an end
> customers router is up or down. SNMP I assume would be too resource
> intensive, so ICMP pings seem like the only logical solution.
> 
> The question is once a second pings too polling on an NMS and a
> consumer grade router? Does it take much network bandwidth and CPU
> resources from both the NMS and CPE side?
> 
> Lets say this is for a 1,000 customer ISP.
> 
> 
> 


Re: Pinging a Device Every Second

2018-12-15 Thread Mark Tinka



On 15/Dec/18 19:37, Baldur Norddahl wrote:


>
>
> BFD is the hardware solution.

Don't remind me that Juniper currently don't support BFD in hardware for
IS-IS- or OSPFv3-signaled IPv6 routing :-(.

Mark.


Re: Pinging a Device Every Second

2018-12-15 Thread Baldur Norddahl
You could configure BFD to send out a SNMP alert when three packets have
been missed on a 50 ms cycle. Or instantly if the interface charges state
to down. This way you would know that they are down within 150 ms.

BFD is the hardware solution. A Linux box that has to ping 1000 addresses
per second will be very taxed and likely unable to do that in a stable way.
You will have seconds where it fails to do them all followed by seconds
where it attempts to do them more than once. The result is that the
statistics gathered is worthless. If you do something like this, it is much
better to have a less ambitious 1 minute cycle.

Take a look at Smokeping. If you want a graph to show the quality of the
line, Smokeping makes some very good graphs for that.

Regards
Baldur

15. dec. 2018 16.49 skrev "Colton Conor" :

How much compute and network resources does it take for a NMS to:

1. ICMP ping a device every second
2. Record these results.
3. Report an alarm after so many seconds of missed pings.

We are looking for a system to in near real-time monitor if an end
customers router is up or down. SNMP I assume would be too resource
intensive, so ICMP pings seem like the only logical solution.

The question is once a second pings too polling on an NMS and a consumer
grade router? Does it take much network bandwidth and CPU resources from
both the NMS and CPE side?

Lets say this is for a 1,000 customer ISP.


Re: Pinging a Device Every Second

2018-12-15 Thread Stephen Satchell
On 12/15/18 7:48 AM, Colton Conor wrote:
> How much compute and network resources does it take for a NMS to:
> 
> 1. ICMP ping a device every second
> 2. Record these results.
> 3. Report an alarm after so many seconds of missed pings.
> 
> We are looking for a system to in near real-time monitor if an end
> customers router is up or down. SNMP I assume would be too resource
> intensive, so ICMP pings seem like the only logical solution.
> 
> The question is once a second pings too polling on an NMS and a consumer
> grade router? Does it take much network bandwidth and CPU resources from
> both the NMS and CPE side?
> 
> Lets say this is for a 1,000 customer ISP.

What problem are you trying to solve, exactly?  That more than anything
will dictate what you do.

Short answer: about 1500 bits of bandwidth, and the CPU loading on the
remote device is almost invisible.  Remember the only real difference
between ping and SNMP monitoring (UDP) is the organization of the bits
in the packet and the protocol number in the IP header.  It's still one
packet pair exchanged, unless you get really ambitious with your SNMP
OID list.

When I was in a medium-sized hosting company, I developed an SNMP-based
monitoring system that would query a number of load parameters (CPU,
disk, network, overall) on a once a minute schedule, and would keep
history for hours on the monitoring server.  The boss fretted about the
load such monitoring would impose.  He never saw any.

For pure link monitoring, which is what I'm hearing you want to do, in
my experience I found that a six-second ping cycle gives lots of early
warning for link failures.  Again, it depends on the specifications and
detection targets.

Some things to consider:

1.  Router restarts take a while.  Consumer-grade routers can take a
minute or more to complete a restart to the point where it will respond
to ping.  Carrier-grade routers are more variable but in general have so
many options built into them that it takes longer to complete a restart
cycle.  Since you are talking consumer-grade gear, you probably don't
want to be sensitive to CP power sags.

2.  Depending on the technology used on the link, you may get some
short-term outages, on the order of seconds, so doing "rapid" pings do
nothing for you.  During my DSL time, ATM would drop out for short
intervals -- so watch out for nuisance trips.

3.  Some routers implement ping limiting, so you have to balance your
monitoring sample rate against DoS susceptibility. Offhand, I don't know
the granularity of consumer router ping limiting, as I've never had that
question pop up.

4.  How large a monitoring server are you willing to devote to such a
system?  My web host monitoring used a 400-MHz Pentium II box, and it
didn't even breathe hard.  (A 1U Cobalt box, repurposed with Red Had
Linux, pulled from a junk pile.)  I was monitoring about 150 web host
servers. Extraolatuing the system load on that Cobalt box, I could have
handled 1500 web host servers and more.



Pinging a Device Every Second

2018-12-15 Thread Colton Conor
How much compute and network resources does it take for a NMS to:

1. ICMP ping a device every second
2. Record these results.
3. Report an alarm after so many seconds of missed pings.

We are looking for a system to in near real-time monitor if an end
customers router is up or down. SNMP I assume would be too resource
intensive, so ICMP pings seem like the only logical solution.

The question is once a second pings too polling on an NMS and a consumer
grade router? Does it take much network bandwidth and CPU resources from
both the NMS and CPE side?

Lets say this is for a 1,000 customer ISP.