Hi Robert:

When you said you tested the Python metric modules, did you just test the
Python scripts under Windows or did you somehow got gmond compiled under
Windows natively with Python support?

Thanks,

Bernard

On Thursday, July 12, 2012, Robert Alexander wrote:

> Hey,
>
> A meeting may be a good idea.  My schedule is mostly open next week.  When
> are others free?  I will brush up on sflow by then.
>
> NVML and the Python metric module are tested at NVIDIA on Windows and
> Linux, but not within Cygwin.  The process will be easier/faster on the
> NVML side if we keep Cygwin out of the loop.
>
> -Robert
>
> -----Original Message-----
> From: Bernard Li [mailto:bern...@vanhpc.org <javascript:;>]
> Sent: Thursday, July 12, 2012 10:49 AM
> To: Nigel LEACH
> Cc: lozgachev.i...@gmail.com <javascript:;>;
> ganglia-general@lists.sourceforge.net <javascript:;>; Peter Phaal; Robert
> Alexander
> Subject: Re: [Ganglia-general] Gmond Compilation on Cygwin
>
> Hi Nigel:
>
> Technically you only need 3.1 gmond to have support for the Python metric
> module.  But I'm not sure whether we have ever tested this under Windows.
>
> Peter and Robert: How quickly can we get hsflowd to support GPU metrics
> collection internally?  Should we setup a meeting to discuss this?
>
> Thanks,
>
> Bernard
>
> On Thu, Jul 12, 2012 at 4:05 AM, Nigel LEACH <
> nigel.le...@uk.bnpparibas.com <javascript:;>> wrote:
> > Thanks Ivan, but we have 3.0 and 3.1 gmond running under Cygwin (and
> using APR), the problem is with the 3.4 spin.
> >
> > -----Original Message-----
> > From: lozgachev.i...@gmail.com <javascript:;> [mailto:
> lozgachev.i...@gmail.com <javascript:;>]
> > Sent: 12 July 2012 11:54
> > To: Nigel LEACH
> > Cc: peter.ph...@gmail.com <javascript:;>;
> ganglia-general@lists.sourceforge.net <javascript:;>
> > Subject: Re: [Ganglia-general] Gmond Compilation on Cygwin
> >
> > Hi all,
> >
> > Maybe it will be interesting. Some time ago I successfully compiled
> gmond 3.0.7 and 3.1.2 under Cygwin. If you need it I can upload somewhere
> gmond and 3rd party sources + compilation script.
> > Also, I have gmetad 3.0.7 compiled for Windows. In additional, I
> developed (just for fun) my implementation of gmetad 3.1.2 using .NET and
> C#.
> >
> > P. S. I do not know whether it is possible to use these gmong versions
> to collect statistic from GPU.
> >
> > --
> > Best regards,
> > Ivan.
> >
> > 2012/7/12 Nigel LEACH <nigel.le...@uk.bnpparibas.com <javascript:;>>:
> >> Thanks for the updates Peter and Bernard.
> >>
> >> I have been unable to get gmond 3.4 working under Cygwin, my latest
> errors are parsing gm_protocol_xdr.c. I don't know whether we should follow
> this up, it would be nice to have a Windows gmond, but my only reason for
> upgrading are the GPU metrics.
> >>
> >> I take you point about re-using the existing GPU module and gmetric,
> unfortunately I don't have experience with Python. My plan is to write
> something in C to export the nvml metrics, with various output options. We
> will then decide whether to call this new code from existing gmond 3.1 via
> gmetric, new (if we get it working) gmond 3.4, or one of our existing third
> party tools - ITRS Geneous.
> >>
> >> As regards your list of metrics they are pretty definitive, but I
> >> will probably also export
> >>
> >> *total ecc errors - nvmlDeviceGetTotalEccErrors) *individual ecc
> >> errors - nvmlDeviceGetDetailedEccErrors *active compute processes -
> >> nvmlDeviceGetComputeRunningProcesses
> >>
> >> Regards
> >> Nigel
> >>
> >> -----Original Message-----
> >> From: peter.ph...@gmail.com <javascript:;> [mailto:
> peter.ph...@gmail.com <javascript:;>]
> >> Sent: 10 July 2012 20:06
> >> To: Nigel LEACH
> >> Cc: bern...@vanhpc.org <javascript:;>;
> ganglia-general@lists.sourceforge.net <javascript:;>
> >> Subject: Re: [Ganglia-general] Gmond Compilation on Cygwin
> >>
> >> Nigel,
> >>
> >> A simple option would be to use Host sFlow agents to export the core
> metrics from your Windows servers and use gmetric to send add the GPU
> metrics.
> >>
> >> You could combine code from the python GPU module and gmetric
> >> implementations to produce a self contained script for exporting GPU
> >> metrics:
> >>
> >> https://github.com/ganglia/gmond_python_modules/tree/master/gpu/nvidi
> >> a https://github.com/ganglia/ganglia_contrib
> >>
> >> Longer term, it would make sense to extend Host sFlow to use the
> C-based NVML API to extract and export metrics. This would be
> straightforward - the Host sFlow agent uses native C APIs on the platforms
> it supports to extract metrics.
> >>
> >> What would take some thought is developing standard set of summary
> metrics to characterize GPU performance. Once the set of metrics is agreed
> on, then adding them to the sFlow agent is pretty trivial.
> >>
> >> Currently the Ganglia python module exports the following metrics - are
> they the right set? Anything missing? It would be great to get involvement
> from the broader Ganglia community to capture best practice from anyone
> running large GPU clusters, as well as getting input from NVIDIA about the
> key metrics.
> >>
> >> * gpu_num
> >> * gpu_driver
> >> * gpu_type
> >> * gpu_uuid
> >> * gpu_pci_id
> >> * gpu_mem_total
> >> * gpu_graphics_speed
> >> * gpu_sm_speed
> >> * gpu_mem_speed
> >> * gpu_max_graphics_speed
> >> * gpu_max_sm_speed
> >> * gpu_max_mem_speed
> >> * gpu_temp
> >> * gpu_util
> >> * gpu_mem_util
> >> * gpu_mem_used
> >> * gpu_fan
> >> * gpu_power_usage
> >> * gpu_perf_state
> >> * gpu_ecc_mode
> >>
> >> As far as scalability is concerned, you should find that moving to
> sFlow as the measurement transport reduces network traffic since all the
> metrics for a node are transported in a single UDP datagram (rather than a
> datagram per metric when using gmond as the agent). The other consideration
> is that sFlow is unicast, so if you are using a multicast Ganglia setup
> then this involves re-structuring your a configuration.
> >>
> >> You still need to have at least one gmond instance, but it acts as an
> sFlow aggregator and is mute:
> >> http://blog.sflow.com/2011/07/ganglia-32-released.html
> >>
> >> Peter
> >>
> >> On Tue, Jul 10, 2012 at 8:36 AM, Nigel LEACH <
> nigel.le...@uk.bnpparibas.com> wrote:
> >>> Hello Bernard, I was coming to that conclusion, I've been trying to
> >>> compile on various combinations of Cygwin, Windows, Hardware this
> >>> afternoon, but without success yet. I've still got a few more tests to
> do though.
> >>>
> >>>
> >>>
> >>> The GPU plugin is my only reason for upgrading from our current
> >>> 3.1.7, and there is nothing else esoteric we use. We do have Linux
> >>> Blades, but all of our Tesla's are hosted on Windows.  The entire
> >>> estate is quite large, so we would need to ensure sFlow scales, no
> >>> reason to think it won't, but I have little experience with it..
> >>>
> >>>
> >>>
> >>> Regards
> >>>
> >>> Nigel
> >>>
> >>>
> >>>
> >>> From: bern...@vanhpc.org [mailto:bern...@vanhpc.org]
> >>> Sent: 10 July 2012 16:19
> >>> To: Nigel LEACH
> >>> Cc: neil.mckee...@gmail.com; ganglia-general@lists.sourceforge.net
> >>>
> >>>
> >>> Subject: Re: [Ganglia-general] Gmond Compilation on Cygwin
> >>>
> >>>
> >>>
> >>> Hi Nigel:
> >>>
> >>>
> >>>
> >>> Perhaps other developers could chime in but I'm not sure if the
> >>> latest version could be compiled under Windows, at least I was not
> >>> aware of any testing done.
> >>>
> >>>
> >>>
> >>> Going forward I would like to encourage users to use hsflowd under
> Windows.
> >>> I'm talking to the developers to see if we can add support for GPU
> >>> monitoring.  Do you have any other requirements besides that?
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>>
> >>>
> >>> Bernard
> >>>
> >>> On Tuesday, July 10, 2012, Nigel LEACH wrote:
> >>>
> >>> Hi Neil, Many thanks for the swift reply.
> >>>
> >>>
>
> >-----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
>
> -----------------------------------------------------------------------------------
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to