OK, here are some thoughts then to consider. * Collectd (or other monitoring frontend) on each host will probably be sending measurements at least every minute or more often (VNFs may have longer cycles but for bare metal hosts I imagine we would want measurements at least every minute). Those would make an easy trigger for proxied host heartbeats from the VES Agent. * No report from collectd means the host may be down. * To avoid collectd itself from causing a false positive heartbeat failure, it should be run in a container that is managed by a framework like Kubernetes, so that if/when it fails it is automatically respawned on the same host. * For heartbeats from VNFs, this should be integrated into the VNF code, such that if the VNF is dead or a zombie, the heartbeat fails. It should not be issued by a separate process on the VNF host (VM or container) since that may result in a false negative.
If that makes sense maybe Gokul can work with Intel to update the ves_app.py Agent to proxy the heartbeats for all the hosts it gets reports from. I'll work on options for deploying the collectd containers using e.g. Kubernetes+Helm and creating a demo/dummy VNF that issues heartbeats via the ONAP Agent libraries (Gokul is welcome to help there as well...). Thanks, Bryan Sullivan | AT&T From: GUPTA, ALOK Sent: Monday, November 20, 2017 6:19 AM To: SULLIVAN, BRYAN L (BRYAN L) <bryan.sulli...@research.att.com> Cc: 'opnfv-tech-discuss@lists.opnfv.org' <opnfv-tech-discuss@lists.opnfv.org> Subject: RE: [barometer] VES Heartbeats Bryan: Interesting question. It is similar to getting VNF events from an EMS or OAM vm. The reason and rationale for heartbeat event was to avoid sending pings/queries to the devices but having DCAE analytics capable of analyzing heartbeat and metrics to determine health status of the device (compared to currently done via heath-check query to VNF). We had discussed earlier if the events coming in can assume if VNF is ok, instead of heartbeat and team felt otherwise. Fault and syslog event frequency can vary (you may not receive event for hours. If VNF is running smoothly). With Metrics the interval can be long (5 mins...15mins)...thus a need for Heartbeat event. The heartbeat is not from agent but for devices for which data is being send. In some cases the entity forwarding the data could determine the health and it can create and send a HB event. You said this very well in your email for the infrastructure scenario, we may need to proxy the heartbeats for hosts not running the agent. Hope this helps. Regards, Alok Gupta 732-420-7007 MT B2 3D30 ag1...@att.com<mailto:ag1...@att.com> From: SULLIVAN, BRYAN L Sent: Monday, November 20, 2017 8:55 AM To: GUPTA, ALOK <ag1...@att.com<mailto:ag1...@att.com>> Cc: 'opnfv-tech-discuss@lists.opnfv.org' <opnfv-tech-discuss@lists.opnfv.org<mailto:opnfv-tech-discuss@lists.opnfv.org>> Subject: [barometer] VES Heartbeats Alok, Thinking about the shared Agent (ves_app.py from Barometer) design, in which we don't need an agent running on each node, but can use a single agent running on the local cloud which aggregates VES events from the Kafka bus, it brings up the question of how heartbeats are supposed to work (and what we use them for) in the VES design. Beyond the VNF (presumably by integration of the ONAP VES library into the VNF), have you been assuming that the heartbeats represent the health of: * The VES agent * A host (real or virtual, whether running a VES agent or not) from which VES events are received If the latter, we need to consider how the agent can proxy the heartbeats for hosts on which there is no agent running, e.g. the agent can keep a host-based flag that is set whenever a collectd event is picked up on the Kafka bus during the heartbeat period, and send a Heartbeat report for each host at the end of the period. But really in that case couldn't DCAE derive that information anyway from what it had received? So it calls into question the purpose of the heartbeat beyond the VNF itself (the obvious use case) - I just need to clarify it. Thanks, Bryan Sullivan | AT&T
_______________________________________________ opnfv-tech-discuss mailing list opnfv-tech-discuss@lists.opnfv.org https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss