Re: [Ganglia-developers] gmond segfault with libpython
If you'd look for someone I could help with that - just send me a msg when you're sure ;) On Sun, Feb 09, 2014 at 11:07:22PM -0800, Bernard Li wrote: Do we still have a maintainer for the Ganglia packages for EPEL? If not, should we see if somebody would like to fill that position? Thanks, Bernard On Sun, Feb 9, 2014 at 6:15 PM, Vladimir Vuksan vli...@veus.hr wrote: Those RPMS work just fine for me [root@localhost ~]# uname -a Linux localhost.localdomain 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux [root@localhost ~]# cat /etc/issue CentOS release 6.5 (Final) Kernel \r on an \m [root@localhost ~]# rpm -ivh http://vuksan.com/centos/RPMS-6/x86_64/ganglia-gmond-modules-python-3.6.0-1.x86_64.rpm http://vuksan.com/centos/RPMS-6/x86_64/libganglia-3.6.0-1.x86_64.rpm http://vuksan.com/centos/RPMS-6/x86_64/ganglia-gmond-3.6.0-1.x86_64.rpm http://vuksan.com/centos/RPMS-6/x86_64/libconfuse-2.6-2.el6.rf.x86_64.rpm Retrieving http://vuksan.com/centos/RPMS-6/x86_64/ganglia-gmond-modules-python-3.6.0-1.x86_64.rpm Retrieving http://vuksan.com/centos/RPMS-6/x86_64/libganglia-3.6.0-1.x86_64.rpm Retrieving http://vuksan.com/centos/RPMS-6/x86_64/ganglia-gmond-3.6.0-1.x86_64.rpm Retrieving http://vuksan.com/centos/RPMS-6/x86_64/libconfuse-2.6-2.el6.rf.x86_64.rpm warning: /var/tmp/rpm-tmp.JtASFF: Header V3 DSA/SHA1 Signature, key ID 6b8d79e6: NOKEY Preparing...### [100%] 1:libconfuse ### [ 25%] 2:libganglia ### [ 50%] 3:ganglia-gmond ### [ 75%] 4:ganglia-gmond-modules-p### [100%] [root@localhost ~]# gmond -d 2 loaded module: core_metrics loaded module: cpu_module loaded module: disk_module loaded module: load_module loaded module: mem_module loaded module: net_module loaded module: proc_module loaded module: sys_module loaded module: python_module udp_recv_channel mcast_join=239.2.11.71 mcast_if=NULL port=8649 bind=239.2.11.71 buffer=0 On 02/09/2014 09:39 AM, Jeff Layton wrote: Vladimir, I initially tried your binaries on my 6.5 system and I could not get them to install and run (I think they were built with a 6.3 system). At some point I'll try building the rpm's and installing those. Hopefully there is no different in the build process - that would be very interesting if the rpm's worked and building from source didn't :) I'll let you know - but first I'm going to try Maciej's strace idea. Thanks! Jeff P.S. There are some pretty significant differences between 6.4 and 6.5. One big one that I know of is the ntp format changed. I have not seen issues with Centos 6 however I usually build my RPM packages. You could do that if you type rpmbuild -tb ganglia-3.6.0.tar.gz Alternatively if you are interested to try prebuilt packages you can find them here. http://vuksan.com/centos/RPMS-6/x86_64/ Vladimir On 02/08/2014 11:11 AM, Jeff Layton wrote: Good morning, I'm running a CentOS 6.5 system with ganglia 3.6.0 and ganglia-web 3.5.12. I'm following the general guidelines in this article: http://sachinsharm.wordpress.com/tag/installing-ganglia/ Everything goes swimmingly and ganglia itself works fine. So I decide to go to the next step and try using Python with gmond. I followed the general guidelines in this article: http://sachinsharm.wordpress.com/2013/08/19/setup-and-configure-ganglia-python-modules-on-centosrhel-6-3/ But when I start up gmond I get a segfault as reported in /var/log/messages. Feb 5 19:58:47 home4 kernel: gmond[17992]: segfault at 8 ip 0036a7ce6ceb sp 7fffaad46bf0 error 4 in libpython2.6.so.1.0[36a7c0+15d000] Feb 5 19:58:47 home4 abrt[18003]: Saved core dump of pid 17992 (/usr/local/sbin/gmond) to /var/spool/abrt/ccpp-2014-02-05-19:58:47-17992 (4284416 bytes) Feb 5 19:58:47 home4 abrtd: Directory 'ccpp-2014-02-05-19:58:47-17992' creation detected Feb 5 19:58:47 home4 abrtd: Executable '/usr/local/sbin/gmond' doesn't belong to any package and ProcessUnpackaged is set to 'no' Feb 5 19:58:47 home4 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2014-02-05-19:58:47-17992' exited with 1 Feb 5 19:58:47 home4 abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2014-02-05-19:58:47-17992' I'm been trying to debug this but I have to admit that I'm coming up blank. Running gmond with debug doesn't give too much information: [root@home4 laytonjb]# gmond -d 5 -c /etc/ganglia/gmond.conf loaded module: core_metrics loaded module: cpu_module loaded module: disk_module loaded module: load_module loaded module: mem_module loaded module:
Re: [Ganglia-developers] gmond segfault with libpython
Ok so from that I can see that you're including: include (/usr/local/etc/conf.d/*.conf) include('/etc/ganglia/conf.d/*.pyconf') Could you recheck what conf files you have in /usr/local/etc/conf.d/ ? Next thing - why are you building those packages without setting any proper (FHS like) directories (http://pl.wikipedia.org/wiki/Filesystem_Hierarchy_Standard)? I'm almost sure that there is some configuration issue there On Sun, Feb 09, 2014 at 06:17:04PM -0500, Jeff Layton wrote: Sure thing - I appreciate the help. Build options: ./configure --with-gmetad gmond.conf: http://pastebin.com/ExiMgqv0 strace output: I ran the strace using the following command: strace -s 1024 -ff -o strace.log gmond -d 5 -c /etc/ganglia/gmond.conf The output of the thread that has the segfault in it was uploaded to pastebin: http://pastebin.com/xScMVU6P I had to erase the top 200 lines of the strace (too big and I'm not a pro user - yet :) ). But... just to be sure, I'm attaching the compressed tarball. Apologies to all but I just wanted to be sure. Once again - thanks a million! Jeff Could you post here your build options (that ones you entered while ./configure) and also could you paste gmond.conf into pastebin? Also plz strace one more time, but now with strace -s 1024 -e trace=file and paste the output to pastebin On Sun, Feb 09, 2014 at 04:55:14PM -0500, Jeff Layton wrote: I hope this isn't too much output (I've heard about pastebin.com but never really used it). [root@home4 ganglia-3.6.0]# ldd /usr/local/sbin/gmond linux-vdso.so.1 = (0x7fff667f6000) libapr-1.so.0 = /usr/lib64/libapr-1.so.0 (0x7f6a24049000) libresolv.so.2 = /lib64/libresolv.so.2 (0x00337dc0) libganglia-3.6.0.so.0 = /usr/local/lib64/libganglia-3.6.0.so.0 (0x7f6a23e0d000) libdl.so.2 = /lib64/libdl.so.2 (0x00337c40) libnsl.so.1 = /lib64/libnsl.so.1 (0x003390c0) libz.so.1 = /lib64/libz.so.1 (0x00337d00) libpcre.so.0 = /lib64/libpcre.so.0 (0x003f7360) libexpat.so.1 = /lib64/libexpat.so.1 (0x00337f80) libconfuse.so.0 = /usr/lib64/libconfuse.so.0 (0x7f6a23bff000) libpthread.so.0 = /lib64/libpthread.so.0 (0x00337c80) libc.so.6 = /lib64/libc.so.6 (0x00337c00) libuuid.so.1 = /lib64/libuuid.so.1 (0x00338380) libcrypt.so.1 = /lib64/libcrypt.so.1 (0x00338c60) /lib64/ld-linux-x86-64.so.2 (0x00337bc0) libfreebl3.so = /usr/lib64/libfreebl3.so (0x00338ca0) Below is the tree output: [root@home4 ganglia-3.6.0]# tree /etc/ganglia /etc/ganglia ??? conf.d ? ??? procstat.pyconf ??? gmetad.conf ??? gmond.conf 1 directory, 3 files I looked at the strace file for process 3537 and I did see two places where gmond does an access() on the python_modules directory. Does gmond automatically look for the python modes so I don't need to put them the modules section of gmond.conf? Thanks a million! Jeff Oh I didn't think about going that lowlevel :) Could you run ldd on gmond also? Could you also run 'tree' command on /etc/ganglia ? It's interesting that you have two times msg: loaded module: python_module while starting gmond. Rechecking this with strace log shows that it looks like double loading of those modules? http://pastebin.com/BjdCGgbj On Sun, Feb 09, 2014 at 03:28:10PM -0500, Jeff Layton wrote: On 02/09/2014 02:48 PM, Jeff Layton wrote: On 02/09/2014 02:28 PM, Maciej Lasyk wrote: You could also try to catch on which particular check this segfault happens..? Not sure how to check this. When I run gmond interactively, it segfaults just after it says, [root@home4 yum.repos.d]# /usr/local/sbin/gmond -d 5 -c /etc/ganglia/gmond.conf loaded module: core_metrics loaded module: python_module loaded module: cpu_module loaded module: disk_module loaded module: load_module loaded module: mem_module loaded module: net_module loaded module: proc_module loaded module: sys_module loaded module: python_module Segmentation fault (core dumped) I'm not sure where to begin checking. I'm a very old-fashioned debugger - I tend to use a great deal of print statements to track down where things are happening. I can start doing this in gmond. I tried putting fprintf's all over the gmond.c (yep - I'm that poor of a debugger). I'm not sure but if looks like it segfaults in the function setup_metric_callbacks on the statement, if (modp-init modp-init(global_context)) { or on the function, apr_pool_cleanup_register(global_context, modp, modular_metric_cleanup, apr_pool_cleanup_null); I'm not too sure. I apologize if I'm wasting your time with my poor debugging skills. Thanks!
Re: [Ganglia-developers] gmond segfault with libpython
The only thing in /usr/local/etc/conf.d/ is modpython.conf. Given your guidance I think I've figured things out (I think). It does appear that the python modules get loaded twice (actually 3 times in my case). The time is in gmond.conf where I have it in the modules section: modules { module { name = core_metrics } module { name = python_module path = /usr/local/lib64/ganglia/modpython.so params = /usr/local/lib64/ganglia/python_modules/ } ... } At the end of /etc/ganglia/gmond.conf I have two include lines: include (/usr/local/etc/conf.d/*.conf) include('/etc/ganglia/conf.d/*.pyconf') The first line includes the file /usr/local/etc/conf.d/modpython.conf. This file has the following lines: [root@home4 ganglia]# more /usr/local/etc/conf.d/modpython.conf /* params - path to the directory where mod_python should look for python metric modules the pyconf files in the include directory below will be scanned for configurations for those modules */ modules { module { name = python_module path = modpython.so params = /usr/local/lib64/ganglia/python_modules } } include (/etc/ganglia/conf.d/*.pyconf) So it looks like the python modules get loaded 3 times (once for the first include, a second time for the include line in the file /usr/local/etc/conf.d/modpython.conf, and then a third time for the second include line in gmond.conf. Therefore, I erased the module lines in gmond.conf so that I don't load them. I also erased the include line at the end of gmond.conf pointing to /etc/ganglia/conf.d/*.pyconf. The only include line in gmond.conf is the following: include (/usr/local/etc/conf.d/*.conf) You can find my current gmond.conf file here: http://pastebin.com/FJ2WAC4D In the file /usr/local/etc/conf.d/modpython.conf, I commented out the last line which is an include line pointing to /etc/ganglia/conf.d/*.pyconf. The file now simply reads: /* params - path to the directory where mod_python should look for python metric modules the pyconf files in the include directory below will be scanned for configurations for those modules */ modules { module { name = python_module path = modpython.so params = /usr/local/lib64/ganglia/python_modules } } I think all of this means that python modules only get loaded once when it gmond.conf does the include that points to /usr/local/etc/conf.d/*.conf Note - this file looks like: /* params - path to the directory where mod_python should look for python metric modules the pyconf files in the include directory below will be scanned for configurations for those modules */ modules { module { name = python_module path = /usr/local/lib64/ganglia/modpython.so params = /usr/local/lib64/ganglia/python_modules } } I think this should fix the problem so I tried running gmond interactively: /usr/local/sbin/gmond -d 5 -c /etc/ganglia/gmond.conf I still get a segfault. As an aside, this is just an experiment so I can learn about writing python modules in Ganglia. Therefore I'm not too concerned about the location of configuration files since it's temporary. But, I followed all of the defaults in ganglia about installing the code to /usr/local. I did create the directory /etc/ganglia since I wanted all ganglia related files to be in one location rather spread across all of /etc *it may not be FHS compliant but it's a practice I have developed over the years. In general I followed this blog: http://sachinsharm.wordpress.com/tag/installing-ganglia/ for building and installing ganglia. Everything worked just fine until I followed this blog http://sachinsharm.wordpress.com/2013/08/19/setup-and-configure-ganglia-python-modules-on-centosrhel-6-3/ for configuring Python modules. But I backed out all of the changes in that blog so that I was starting in a clean configuration. Thanks for the help! You have been very patient and I really appreciate it. Jeff Maciej Lasyk wrote: Ok so from that I can see that you're including: include (/usr/local/etc/conf.d/*.conf) include('/etc/ganglia/conf.d/*.pyconf') Could you recheck what conf files you have in /usr/local/etc/conf.d/ ? Next thing - why are you building those packages without setting any proper (FHS like) directories (http://pl.wikipedia.org/wiki/Filesystem_Hierarchy_Standard)? I'm almost sure that there is some configuration issue there On Sun, Feb 09, 2014 at 06:17:04PM -0500, Jeff Layton wrote: Sure thing - I appreciate the help. Build options: ./configure --with-gmetad gmond.conf: http://pastebin.com/ExiMgqv0 strace output: I ran the strace using the following command: strace -s 1024 -ff -o strace.log gmond -d 5 -c /etc/ganglia/gmond.conf The output of the thread that has the segfault in it was uploaded to pastebin: http://pastebin.com/xScMVU6P I had to erase the top 200 lines of the strace (too big and I'm
Re: [Ganglia-developers] gmond segfault with libpython
On 02/09/2014 09:15 PM, Vladimir Vuksan wrote: Those RPMS work just fine for me [root@localhost ~]# uname -a Linux localhost.localdomain 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux [root@localhost ~]# cat /etc/issue CentOS release 6.5 (Final) Kernel \r on an \m [root@localhost ~]# rpm -ivh http://vuksan.com/centos/RPMS-6/x86_64/ganglia-gmond-modules-python-3.6.0-1.x86_64.rpm http://vuksan.com/centos/RPMS-6/x86_64/libganglia-3.6.0-1.x86_64.rpm http://vuksan.com/centos/RPMS-6/x86_64/ganglia-gmond-3.6.0-1.x86_64.rpm http://vuksan.com/centos/RPMS-6/x86_64/libconfuse-2.6-2.el6.rf.x86_64.rpm Retrieving http://vuksan.com/centos/RPMS-6/x86_64/ganglia-gmond-modules-python-3.6.0-1.x86_64.rpm Retrieving http://vuksan.com/centos/RPMS-6/x86_64/libganglia-3.6.0-1.x86_64.rpm Retrieving http://vuksan.com/centos/RPMS-6/x86_64/ganglia-gmond-3.6.0-1.x86_64.rpm Retrieving http://vuksan.com/centos/RPMS-6/x86_64/libconfuse-2.6-2.el6.rf.x86_64.rpm warning: /var/tmp/rpm-tmp.JtASFF: Header V3 DSA/SHA1 Signature, key ID 6b8d79e6: NOKEY Preparing... ### [100%] 1:libconfuse ### [ 25%] 2:libganglia ### [ 50%] 3:ganglia-gmond ### [ 75%] 4:ganglia-gmond-modules-p### [100%] [root@localhost ~]# gmond -d 2 loaded module: core_metrics loaded module: cpu_module loaded module: disk_module loaded module: load_module loaded module: mem_module loaded module: net_module loaded module: proc_module loaded module: sys_module loaded module: python_module udp_recv_channel mcast_join=239.2.11.71 mcast_if=NULL port=8649 bind=239.2.11.71 buffer=0 Interesting. When I started my project about a month ago, I tried your rpms' (they came up first on a google search) but I couldn't get them to install it appears they were built for CentOS 6.3 perhaps? The link that came up is: http://vuksan.com/centos/RPMS/x86_64/ that doesn't match your URL (I didn't know about RPMS-6. The 3.6.0 rpm's are dated 2013 (07-May-2013) which is why I had problems. Jeff -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] GSoC application started, more help needed
Hi! Thanks for answering! On 02/10/2014 08:27 PM, Daniel Pocock wrote: On 07/02/14 21:19, Adrian Sevcenco wrote: On 02/07/2014 09:46 PM, Daniel Pocock wrote: Please feel free to add potential project ideas here: https://github.com/ganglia/monitor-core/wiki/GSoC-2014-project-ideas Hi! There were several discussions on the list regarding what i will mention and i will reiterate the basic points in order to have some kind of definitive closure regarding these and see if there are worthy of doing (either in GSoC or not) I will refer only to gmond framework: 1. Adding a string to globals similar to hostname named something like host_uuid; it can contain either a fixed (overridden) uuid or some automatic approach can be chosen (later) (with sensible defaults like empty); this could pave the way for have uuid--metrics association instead of hostname--metrics I already have something like that on a branch, it is in the wiki somewhere Great! 2. make cluster name be (also) a pool metric of the host; this could pave the way to have gmond aggregators (gmonds that gather data from devices in close network proximity but in different logical partitions (clusters)); something like this i think would be useful in clouds or distributed computing associations like the grid. I imagine/hope that these addons will have no impact on gmetad and are completely backward compatible. So, what the experts think? Thank you for taking this into consideration, Adrian The only danger with (1) is that it involves changing the core agent. Finding students with good C skills and supervising their work on the agent itself is a little more demanding than supervising a student who makes a plugin or some piece of work to complement Ganglia. yeah, this is true, but the example code for (1) is already in ganglia.. one just need to stitch the pieces.. also the code for auto can be added later when the metric is already accepted in the framework. Could you comment more on (2)? I frequently see requests from people who want overlapping cluster aggregation. For example, somebody may want to be able to see aggregate reports for any of the following sets: a) grouped by OS (Linux, Solaris, Windows) b) grouped by vendor (Intel, AMD, ARM, ...) c) grouped by role (production servers, test servers, development servers) Well, the idea of (2) is to shift the task of grouping by a category from gmond to gmetad (or other in house developed monitoring frontend). It has nothing to do with the a), b) and c) like scenarios but with capacity to transmit data of _different_ clusters through a single aggregation gmond (channel). For the sake of simplicity i will imagine an cloud example even if my need and experience is with grid computing: lets say that you make several instances of computing and storage with a provider in one area and some other instances in other area (and maybe other provider). You want to have 2 clusters :computing and storage. so you set up a gmond aggregator on each site that gathers _both_ storage and computing information and send data through ipsec to your gmond aggregator from your home/institutional monitoring node. At that point a gmetad (or another custom build frontend) read the data and write the corresponding data in each corresponding cluster for all machines. (and ignoring the cluster tag of the aggregator gmond that surrounds all data) IMHO the a),b),c) scenarios are in the responsibility presentation framework (ganglia web) and i think this could be done by some json views... For me would be very useful.. is this something to help others as well? Thanks! Adrian smime.p7s Description: S/MIME Cryptographic Signature -- Androi apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmond segfault with libpython
At this point I suggest you - wipe out that Ganglia installation and just use Epel repo - it has everything you need (gmond, gmetad, ganglia-gmond-python). That blog you was basing on is terrible. It's very bad to make install without creating packages - no one should do this. Moreover - this installation is not based on any good filesystem hierarchy standard. Configuration files in /usr/local? Editing ld.so.conf instead of creating file in ld.so.conf.d? Those are really bad practices that lead guys to situations like yours. Epel repo is very good, stable and secure. You can easily use it instead of creating your own packages. And if you really have to - use rpmbuild or https://github.com/jordansissel/fpm And try installing Centos minimal at first - without any additional packages. It really makes things simple :) This segfault looks like some Python version problem; maybe you have more than one Python installed or maybe you have some issues with Python libraries. It's really hard to find sometimes - I would suggest you cleaning this installation and starting over using packages. On Mon, Feb 10, 2014 at 10:54:35AM -0500, Jeff Layton wrote: The only thing in /usr/local/etc/conf.d/ is modpython.conf. Given your guidance I think I've figured things out (I think). It does appear that the python modules get loaded twice (actually 3 times in my case). The time is in gmond.conf where I have it in the modules section: modules { module { name = core_metrics } module { name = python_module path = /usr/local/lib64/ganglia/modpython.so params = /usr/local/lib64/ganglia/python_modules/ } ... } At the end of /etc/ganglia/gmond.conf I have two include lines: include (/usr/local/etc/conf.d/*.conf) include('/etc/ganglia/conf.d/*.pyconf') The first line includes the file /usr/local/etc/conf.d/modpython.conf. This file has the following lines: [root@home4 ganglia]# more /usr/local/etc/conf.d/modpython.conf /* params - path to the directory where mod_python should look for python metric modules the pyconf files in the include directory below will be scanned for configurations for those modules */ modules { module { name = python_module path = modpython.so params = /usr/local/lib64/ganglia/python_modules } } include (/etc/ganglia/conf.d/*.pyconf) So it looks like the python modules get loaded 3 times (once for the first include, a second time for the include line in the file /usr/local/etc/conf.d/modpython.conf, and then a third time for the second include line in gmond.conf. Therefore, I erased the module lines in gmond.conf so that I don't load them. I also erased the include line at the end of gmond.conf pointing to /etc/ganglia/conf.d/*.pyconf. The only include line in gmond.conf is the following: include (/usr/local/etc/conf.d/*.conf) You can find my current gmond.conf file here: http://pastebin.com/FJ2WAC4D In the file /usr/local/etc/conf.d/modpython.conf, I commented out the last line which is an include line pointing to /etc/ganglia/conf.d/*.pyconf. The file now simply reads: /* params - path to the directory where mod_python should look for python metric modules the pyconf files in the include directory below will be scanned for configurations for those modules */ modules { module { name = python_module path = modpython.so params = /usr/local/lib64/ganglia/python_modules } } I think all of this means that python modules only get loaded once when it gmond.conf does the include that points to /usr/local/etc/conf.d/*.conf Note - this file looks like: /* params - path to the directory where mod_python should look for python metric modules the pyconf files in the include directory below will be scanned for configurations for those modules */ modules { module { name = python_module path = /usr/local/lib64/ganglia/modpython.so params = /usr/local/lib64/ganglia/python_modules } } I think this should fix the problem so I tried running gmond interactively: /usr/local/sbin/gmond -d 5 -c /etc/ganglia/gmond.conf I still get a segfault. As an aside, this is just an experiment so I can learn about writing python modules in Ganglia. Therefore I'm not too concerned about the location of configuration files since it's temporary. But, I followed all of the defaults in ganglia about installing the code to /usr/local. I did create the directory /etc/ganglia since I wanted all ganglia related files to be in one location rather spread across all of /etc *it may not be FHS compliant but it's a practice I have developed over the years. In general I followed this blog: http://sachinsharm.wordpress.com/tag/installing-ganglia/ for building and installing ganglia. Everything worked just fine until I followed this blog
Re: [Ganglia-developers] gmond segfault with libpython
I'm leaning this way :) I think things have gotten too screwed up (to use a technical term) and there are problems. The thing I'm concerned about is that the epel repo only has version 3.1.7 (seems pretty darn old to me). I want something newer and I want the new web interface. [root@home4 ganglia]# yum list all | grep -i ganglia ganglia.i686 3.1.7-6.el6epel ganglia.x86_64 3.1.7-6.el6epel ganglia-devel.i686 3.1.7-6.el6epel ganglia-devel.x86_64 3.1.7-6.el6epel ganglia-gmetad.x86_64 3.1.7-6.el6epel ganglia-gmond.x86_64 3.1.7-6.el6epel ganglia-gmond-python.x86_64 3.1.7-6.el6epel ganglia-web.x86_64 3.1.7-6.el6epel libnodeupdown-backend-ganglia.x86_64 1.14-1.el6 epel I'm going to try Vladimir's rpm's first but they look really old to me (May 7 2013) which is before Centos 6.5 was out. I may be hitting the mailing list again this evening (I'm writing an article about ganglia that is due in 2 days so I need to finish quickly). Thanks! Jeff At this point I suggest you - wipe out that Ganglia installation and just use Epel repo - it has everything you need (gmond, gmetad, ganglia-gmond-python). That blog you was basing on is terrible. It's very bad to make install without creating packages - no one should do this. Moreover - this installation is not based on any good filesystem hierarchy standard. Configuration files in /usr/local? Editing ld.so.conf instead of creating file in ld.so.conf.d? Those are really bad practices that lead guys to situations like yours. Epel repo is very good, stable and secure. You can easily use it instead of creating your own packages. And if you really have to - use rpmbuild or https://github.com/jordansissel/fpm And try installing Centos minimal at first - without any additional packages. It really makes things simple :) This segfault looks like some Python version problem; maybe you have more than one Python installed or maybe you have some issues with Python libraries. It's really hard to find sometimes - I would suggest you cleaning this installation and starting over using packages. On Mon, Feb 10, 2014 at 10:54:35AM -0500, Jeff Layton wrote: The only thing in /usr/local/etc/conf.d/ is modpython.conf. Given your guidance I think I've figured things out (I think). It does appear that the python modules get loaded twice (actually 3 times in my case). The time is in gmond.conf where I have it in the modules section: modules { module { name = core_metrics } module { name = python_module path = /usr/local/lib64/ganglia/modpython.so params = /usr/local/lib64/ganglia/python_modules/ } ... } At the end of /etc/ganglia/gmond.conf I have two include lines: include (/usr/local/etc/conf.d/*.conf) include('/etc/ganglia/conf.d/*.pyconf') The first line includes the file /usr/local/etc/conf.d/modpython.conf. This file has the following lines: [root@home4 ganglia]# more /usr/local/etc/conf.d/modpython.conf /* params - path to the directory where mod_python should look for python metric modules the pyconf files in the include directory below will be scanned for configurations for those modules */ modules { module { name = python_module path = modpython.so params = /usr/local/lib64/ganglia/python_modules } } include (/etc/ganglia/conf.d/*.pyconf) So it looks like the python modules get loaded 3 times (once for the first include, a second time for the include line in the file /usr/local/etc/conf.d/modpython.conf, and then a third time for the second include line in gmond.conf. Therefore, I erased the module lines in gmond.conf so that I don't load them. I also erased the include line at the end of gmond.conf pointing to /etc/ganglia/conf.d/*.pyconf. The only include line in gmond.conf is the following: include (/usr/local/etc/conf.d/*.conf) You can find my current gmond.conf file here: http://pastebin.com/FJ2WAC4D In the file /usr/local/etc/conf.d/modpython.conf, I commented out the last line which is an include line pointing to /etc/ganglia/conf.d/*.pyconf. The file now simply reads: /* params - path to the directory where mod_python should look for python metric modules the pyconf files in the include directory below will be scanned for configurations for those modules */ modules { module { name = python_module path = modpython.so params = /usr/local/lib64/ganglia/python_modules } } I think all of this means that python modules only get loaded once when it gmond.conf does the include that points to /usr/local/etc/conf.d/*.conf Note - this file looks like: /* params - path to the directory where mod_python should look for python metric modules the pyconf
Re: [Ganglia-developers] gmond segfault with libpython
Jeff, RPMS-6 are the Centos 6 RPMS. RPMS/ are Centos 5 RPMS. Sorry about the confusion. On 02/10/2014 04:33 PM, Jeff Layton wrote: I'm leaning this way :) I think things have gotten too screwed up (to use a technical term) and there are problems. The thing I'm concerned about is that the epel repo only has version 3.1.7 (seems pretty darn old to me). I want something newer and I want the new web interface. [root@home4 ganglia]# yum list all | grep -i ganglia ganglia.i686 3.1.7-6.el6epel ganglia.x86_64 3.1.7-6.el6epel ganglia-devel.i686 3.1.7-6.el6epel ganglia-devel.x86_64 3.1.7-6.el6epel ganglia-gmetad.x86_64 3.1.7-6.el6epel ganglia-gmond.x86_64 3.1.7-6.el6epel ganglia-gmond-python.x86_64 3.1.7-6.el6epel ganglia-web.x86_64 3.1.7-6.el6epel libnodeupdown-backend-ganglia.x86_64 1.14-1.el6 epel I'm going to try Vladimir's rpm's first but they look really old to me (May 7 2013) which is before Centos 6.5 was out. I may be hitting the mailing list again this evening (I'm writing an article about ganglia that is due in 2 days so I need to finish quickly). Thanks! Jeff At this point I suggest you - wipe out that Ganglia installation and just use Epel repo - it has everything you need (gmond, gmetad, ganglia-gmond-python). That blog you was basing on is terrible. It's very bad to make install without creating packages - no one should do this. Moreover - this installation is not based on any good filesystem hierarchy standard. Configuration files in /usr/local? Editing ld.so.conf instead of creating file in ld.so.conf.d? Those are really bad practices that lead guys to situations like yours. Epel repo is very good, stable and secure. You can easily use it instead of creating your own packages. And if you really have to - use rpmbuild or https://github.com/jordansissel/fpm And try installing Centos minimal at first - without any additional packages. It really makes things simple :) This segfault looks like some Python version problem; maybe you have more than one Python installed or maybe you have some issues with Python libraries. It's really hard to find sometimes - I would suggest you cleaning this installation and starting over using packages. On Mon, Feb 10, 2014 at 10:54:35AM -0500, Jeff Layton wrote: The only thing in /usr/local/etc/conf.d/ is modpython.conf. Given your guidance I think I've figured things out (I think). It does appear that the python modules get loaded twice (actually 3 times in my case). The time is in gmond.conf where I have it in the modules section: modules { module { name = core_metrics } module { name = python_module path = /usr/local/lib64/ganglia/modpython.so params = /usr/local/lib64/ganglia/python_modules/ } ... } At the end of /etc/ganglia/gmond.conf I have two include lines: include (/usr/local/etc/conf.d/*.conf) include('/etc/ganglia/conf.d/*.pyconf') The first line includes the file /usr/local/etc/conf.d/modpython.conf. This file has the following lines: [root@home4 ganglia]# more /usr/local/etc/conf.d/modpython.conf /* params - path to the directory where mod_python should look for python metric modules the pyconf files in the include directory below will be scanned for configurations for those modules */ modules { module { name = python_module path = modpython.so params = /usr/local/lib64/ganglia/python_modules } } include (/etc/ganglia/conf.d/*.pyconf) So it looks like the python modules get loaded 3 times (once for the first include, a second time for the include line in the file /usr/local/etc/conf.d/modpython.conf, and then a third time for the second include line in gmond.conf. Therefore, I erased the module lines in gmond.conf so that I don't load them. I also erased the include line at the end of gmond.conf pointing to /etc/ganglia/conf.d/*.pyconf. The only include line in gmond.conf is the following: include (/usr/local/etc/conf.d/*.conf) You can find my current gmond.conf file here: http://pastebin.com/FJ2WAC4D In the file /usr/local/etc/conf.d/modpython.conf, I commented out the last line which is an include line pointing to /etc/ganglia/conf.d/*.pyconf. The file now simply reads: /* params - path to the directory where mod_python should look for python metric modules the pyconf files in the include directory below will be scanned for configurations for those modules */ modules { module { name = python_module path = modpython.so params = /usr/local/lib64/ganglia/python_modules } } I think all of this means that python modules only get loaded once when it gmond.conf does the include that points to
Re: [Ganglia-developers] gmond segfault with libpython
Hi Maciej: Please come find us on IRC and let's talk. Thanks, Bernard On Mon, Feb 10, 2014 at 1:24 AM, Maciej Lasyk mac...@lasyk.info wrote: If you'd look for someone I could help with that - just send me a msg when you're sure ;) On Sun, Feb 09, 2014 at 11:07:22PM -0800, Bernard Li wrote: Do we still have a maintainer for the Ganglia packages for EPEL? If not, should we see if somebody would like to fill that position? Thanks, Bernard On Sun, Feb 9, 2014 at 6:15 PM, Vladimir Vuksan vli...@veus.hr wrote: Those RPMS work just fine for me [root@localhost ~]# uname -a Linux localhost.localdomain 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux [root@localhost ~]# cat /etc/issue CentOS release 6.5 (Final) Kernel \r on an \m [root@localhost ~]# rpm -ivh http://vuksan.com/centos/RPMS-6/x86_64/ganglia-gmond-modules-python-3.6.0-1.x86_64.rpm http://vuksan.com/centos/RPMS-6/x86_64/libganglia-3.6.0-1.x86_64.rpm http://vuksan.com/centos/RPMS-6/x86_64/ganglia-gmond-3.6.0-1.x86_64.rpm http://vuksan.com/centos/RPMS-6/x86_64/libconfuse-2.6-2.el6.rf.x86_64.rpm Retrieving http://vuksan.com/centos/RPMS-6/x86_64/ganglia-gmond-modules-python-3.6.0-1.x86_64.rpm Retrieving http://vuksan.com/centos/RPMS-6/x86_64/libganglia-3.6.0-1.x86_64.rpm Retrieving http://vuksan.com/centos/RPMS-6/x86_64/ganglia-gmond-3.6.0-1.x86_64.rpm Retrieving http://vuksan.com/centos/RPMS-6/x86_64/libconfuse-2.6-2.el6.rf.x86_64.rpm warning: /var/tmp/rpm-tmp.JtASFF: Header V3 DSA/SHA1 Signature, key ID 6b8d79e6: NOKEY Preparing...### [100%] 1:libconfuse ### [ 25%] 2:libganglia ### [ 50%] 3:ganglia-gmond ### [ 75%] 4:ganglia-gmond-modules-p### [100%] [root@localhost ~]# gmond -d 2 loaded module: core_metrics loaded module: cpu_module loaded module: disk_module loaded module: load_module loaded module: mem_module loaded module: net_module loaded module: proc_module loaded module: sys_module loaded module: python_module udp_recv_channel mcast_join=239.2.11.71 mcast_if=NULL port=8649 bind=239.2.11.71 buffer=0 On 02/09/2014 09:39 AM, Jeff Layton wrote: Vladimir, I initially tried your binaries on my 6.5 system and I could not get them to install and run (I think they were built with a 6.3 system). At some point I'll try building the rpm's and installing those. Hopefully there is no different in the build process - that would be very interesting if the rpm's worked and building from source didn't :) I'll let you know - but first I'm going to try Maciej's strace idea. Thanks! Jeff P.S. There are some pretty significant differences between 6.4 and 6.5. One big one that I know of is the ntp format changed. I have not seen issues with Centos 6 however I usually build my RPM packages. You could do that if you type rpmbuild -tb ganglia-3.6.0.tar.gz Alternatively if you are interested to try prebuilt packages you can find them here. http://vuksan.com/centos/RPMS-6/x86_64/ Vladimir On 02/08/2014 11:11 AM, Jeff Layton wrote: Good morning, I'm running a CentOS 6.5 system with ganglia 3.6.0 and ganglia-web 3.5.12. I'm following the general guidelines in this article: http://sachinsharm.wordpress.com/tag/installing-ganglia/ Everything goes swimmingly and ganglia itself works fine. So I decide to go to the next step and try using Python with gmond. I followed the general guidelines in this article: http://sachinsharm.wordpress.com/2013/08/19/setup-and-configure-ganglia-python-modules-on-centosrhel-6-3/ But when I start up gmond I get a segfault as reported in /var/log/messages. Feb 5 19:58:47 home4 kernel: gmond[17992]: segfault at 8 ip 0036a7ce6ceb sp 7fffaad46bf0 error 4 in libpython2.6.so.1.0[36a7c0+15d000] Feb 5 19:58:47 home4 abrt[18003]: Saved core dump of pid 17992 (/usr/local/sbin/gmond) to /var/spool/abrt/ccpp-2014-02-05-19:58:47-17992 (4284416 bytes) Feb 5 19:58:47 home4 abrtd: Directory 'ccpp-2014-02-05-19:58:47-17992' creation detected Feb 5 19:58:47 home4 abrtd: Executable '/usr/local/sbin/gmond' doesn't belong to any package and ProcessUnpackaged is set to 'no' Feb 5 19:58:47 home4 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2014-02-05-19:58:47-17992' exited with 1 Feb 5 19:58:47 home4 abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2014-02-05-19:58:47-17992' I'm been trying to debug this but I have to admit that I'm coming up blank. Running gmond with debug doesn't give too much information: [root@home4 laytonjb]# gmond -d 5 -c /etc/ganglia/gmond.conf loaded