OS: Solaris  5.8, 5.9
File: "tarball/gmond/machines/solaris.c"
Function:  get_kstat_val( )

I've been testing the latest ganglia 2.6.0 posted on March 22nd.
I noticed that gmond segfaults & core dumps when the first online cpu is NOT in 
slot #0.
When I do a "kstat cpu_stat", my cpus are instance #'s 1 & 3.
The crash occurs during the ks = kstat_lookup( ).  It's hardcoded to look for 
cpu instance #0.

Anyways, I wrote the following to check the next cpu instance if the default 
(zero) returns NULL.
So since most machines will likely have a cpu at slot #0, I left the initial 
check as is.
However, if it fails, then start checking instance #1, #2,... etc until one is 
found.
It's not pretty, but it appears to work.
Here's a diff of my changes for solaris.c:

4c4
< #include "debug_msg.h"
---
> #include "lib/debug_msg.h"

229a230,250
>
>    /* CPU_INFO module & instance check
>     * If the first online CPU is not in slot #0, gmond will segfault and core 
> dump.
>     * If ks == NULL, then traverse the cpu instance #'s until the first 
> online cpu is found.
>     * After the traversal, if ks == NULL, then something is very wrong with 
> the cpu configuration.
>     */
>    if ((km_name == "cpu_info") && (ks == NULL))  {
>          int instance_num = 1;
>          int num_tries = 0;
>          while ((num_tries < (int)metriclist.cpu_num.uint32) && (ks == NULL)) 
> {
>              char *new_ks_name = (char *)malloc(20*sizeof(char));
>              sprintf(new_ks_name, "cpu_info%d", instance_num);
>              debug_msg( "Lookup up kstat:  km (unix?)='%s', ks 
> (system_misc?)='%s',kn (resulting metric?)='%s'", km_name, new_ks_name, name);
>              ks = kstat_lookup(kc, km_name, instance_num, new_ks_name);
>              debug_msg("%s: Just did kstat_lookup().\n",new_ks_name);
>              ++instance_num;
>              ++num_tries;
>              free(new_ks_name);
>          }
>    }
>





--

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and destroy this e-mail. Any unauthorized 
copying, disclosure or distribution of the material in this e-mail is strictly 
forbidden.



Reply via email to