Mark,

It does seem like the issue is with the sFlow from nginx-sflow-module.  I
wrote that module so I can probably help:

(1) just one instance of nginx on that server,  or two?
(2) what version of nginx?
(3) single-threaded or multi-threaded nginx?
(4) running on Linux OS?
(5) please upgrade to the latest nginx-sflow-module (0.9.8),  the one you
are running (0.9.7)  has a bug that affects graceful restarts.  The fix was
a one-liner,  so it's not a big step.
(6) please capture and send a trace of the sFlow packets arriving from this
nginx source.  For example,  if the IP address is 10.1.2.3 and it's coming
in on eth0:

root> /usr/sbin/tcpdump -i eth0 -s 0 -w nginx_sflow.pcap udp port 6343 and
ip src 10.1.2.3
<control-c after a few minutes to stop>
root> gzip nginx_sflow.pcap

then send nginx_sflow.pcap.gz

(7) please also send /etc/hsflowd.conf

The kind of thing it might be:
  - two nginx-sflow-modules running on the same host and not disambiguating
properly (supposed to happen automatically by choosing sflow datasource
index as lowest numbered TCP port number that process is listening on)

Regards,
Neil




On Fri, Mar 7, 2014 at 3:40 PM, Bernard Li <bern...@vanhpc.org> wrote:

> Can you connect to the gmond port and paste the XML for the metrics in
> question?  I'd like to see how they're defined.
>
> Thanks,
>
> Bernard
>
> On Fri, Mar 7, 2014 at 11:08 AM, Flanagan, Mark <mark.flana...@unify.com>
> wrote:
> > http://www.sflow.org/ appears to be the defining entity for sflow.
> > http://www.sflow.org/sflow_http.txt would appear to define the http
> sflow data.
> >
> > It is not explicitly clear just what the "counter" values are supposed
> to mean. The general architecture of sflow-like data would suggest the
> values should be a running counter (like the network interface metrics)
> which means gmond is implementing the packets properly and NGINX is sending
> the wrong data.
> >
> > That's just my guess for now.
> >
> >
> > -----Original Message-----
> > From: Bernard Li [mailto:bern...@vanhpc.org]
> > Sent: Friday, March 07, 2014 1:39 PM
> > To: Silver, Jonathan
> > Cc: ganglia-general@lists.sourceforge.net; Flanagan, Mark
> > Subject: Re: [Ganglia-general] NGINX / SFLOW / Ganaglia - metrics get
> corrupted
> >
> > Hi Jonathan:
> >
> > Perhaps you can share how these metrics are defined?
> >
> > Cheers,
> >
> > Bernard
> >
> > On Fri, Mar 7, 2014 at 10:21 AM, Silver, Jonathan
> > <jonathan.sil...@unify.com> wrote:
> >> Does the following analysis mean anything to anyone?
> >> It seems to me that this is a basic thing that should have been seen by
> everyone else and found during first test - unless it's some config
> parameter.
> >>
> >> Thanks
> >> Jon
> >>
> >> -----------------------------------------------------------------------
> >>
> >> Well, I think I understand what is happening - but I don't even want to
> think about fixing it. I'm not sure which software is right.
> >>
> >> The sflow data coming from NGINX reports the number of various HTTP
> messages (GET, HEAD, 1XX, 2XX, etc) in the measured period.
> >> The period is either 10 or 20 seconds - I don't have any idea why that
> isn't consistent.
> >>
> >> When gmond receives the HTTP data in sflow format, it computes the
> difference between the most recently reported value and the one before and
> divides that by the reported interval. That is, it is expecting a running
> total and that is NOT what is received.
> >>
> >> I don't know which software is right, but the NGINX reports are not
> what the gmond handler expects.
> >>
> >> All the other sflow reports appear to be correct.
> >>
> >> -- Mark
> >>
> >>>
> >>> Flow plug-in:  I am still trying to find out, it is actually built by
> >>> another group and I'm not sure what they pulled, but I'm pretty sure
> >>> its 0.9.8
> >>>
> >>> hsflowd version 1.23.2
> >>>
> >>> gmond 3.6.0
> >>>
> >>>  -------------------------------------------------------------
> >>> On Tuesday, 4 March 2014, Silver, Jonathan <jonathan.sil...@unify.com>
> >>> wrote:
> >>>
> >>> We're using NGINX and sflow, to capture and send the metrics to
> ganglia.
> >>> The metric values look correct when viewed using sflowtool, but gmond
> >>> (on the same box)is reporting them with all kinds of random values.
> >>>
> >>> Running gmond --debug=10 I do see some various error messages in the
> log:
> >>>
> >>> Some of these:
> >>> sequence number error - 10.235.240.31:443-3:443 lostSamples=37
> >>>
> >>> Some of these:
> >>> ERROR: [Errno 111] Connection refused
> >>>
> >>> And some with the hostname NULL:  (But only one time for each metric)
> >>> ***Allocating value packet for host--(null)-- and metric
> >>> --http_meth_put--
> >>> ****
> >>>
> >>>
> >>> Has anyone heard of this issue? I've started adding debug statements
> >>> to gmond, but before I go through all of that, if it's a known
> issue.....
> >>>
> >>> Thanks for any info,
> >>> jon
> >>>
> >>>
> >>>
> >>> ----------------------------------------------------------------------
> >>> -------- Subversion Kills Productivity. Get off Subversion & Make the
> >>> Move to Perforce.
> >>> With Perforce, you get hassle-free workflows. Merge that actually
> works.
> >>> Faster operations. Version large binaries.  Built-in WAN optimization
> >>> and the freedom to use Git, Perforce or both. Make the move to
> >>> Perforce.
> >>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.
> >>> clktrk _______________________________________________
> >>> Ganglia-general mailing list
> >>> Ganglia-general@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> >>
> >>
> ------------------------------------------------------------------------------
> >> Subversion Kills Productivity. Get off Subversion & Make the Move to
> Perforce.
> >> With Perforce, you get hassle-free workflows. Merge that actually works.
> >> Faster operations. Version large binaries.  Built-in WAN optimization
> and the
> >> freedom to use Git, Perforce or both. Make the move to Perforce.
> >>
> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
> >> _______________________________________________
> >> Ganglia-general mailing list
> >> Ganglia-general@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
>
> ------------------------------------------------------------------------------
> Subversion Kills Productivity. Get off Subversion & Make the Move to
> Perforce.
> With Perforce, you get hassle-free workflows. Merge that actually works.
> Faster operations. Version large binaries.  Built-in WAN optimization and
> the
> freedom to use Git, Perforce or both. Make the move to Perforce.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
> _______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to