Hi Brad,
I face some mouths ago a quit same problem.
To work arround it, I use a gmetad_node2 in version 3.0.1.
Hereafter the stack of gmetad at failure time ( 3.0.4 ) in my environment :
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 131081 (LWP 12739)]
*__GI___pthread_mutex_unlock (mutex=0x0) at mutex.c:178
178 mutex.c: No such file or directory.
in mutex.c
(gdb) where
#0 *__GI___pthread_mutex_unlock (mutex=0x0) at mutex.c:178
#1 0x0804e1e0 in endElement_CLUSTER ()
#2 0x0804e2ee in end ()
#3 0x0805a26e in doContent ()
#4 0x08059319 in contentProcessor ()
#5 0x0805c6ba in doProlog ()
#6 0x0805c063 in prologProcessor ()
#7 0x0805bfe9 in prologInitProcessor ()
#8 0x08058d4d in XML_ParseBuffer ()
#9 0x08058cb5 in XML_Parse ()
#10 0x0804e3d0 in process_xml ()
#11 0x0804b341 in data_thread ()
#12 0x40085c80 in pthread_start_thread (arg=0x4121dbe0) at manager.c:301
#13 0x40085d82 in pthread_start_thread_event (arg=0x4121dbe0) at
manager.c:324
#14 0x401b9f87 in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:100
(gdb)
(gdb) print *xmldata
$12 = {rval = 134700213, old = 2, sourcename = 0x8075c7f "", hostname = 0x0,
ds = 0x8075cbd,
grid_depth = 6, host_alive = 134700224, source = {id = 29,
report_start = 0x8075cc4 <_IO_stdin_used+8768>, report_end = 0x4,
authority = 0x8075cc9,
authority_ptr = 20, metric_summary = 0x8075c7f, sum_finished = 0x0, ds =
0x8075c7f,
hosts_up = 0, hosts_down = 134700236, localtime = 21, owner = 23679,
latlong = 2055, url = 0,
stringslen = 0,
source = &xmldata->source;
summary = xmldata->source.metric_summary;
/* Release the partial sum mutex */
pthread_mutex_unlock(source->sum_finished);
/*err_msg("%s releasing lock", xmldata->sourcename);*/
Best Regards.
Christian.
----- Original Message -----
From: "Brad Anderson" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, March 06, 2008 8:03 PM
Subject: [Ganglia-general] dual gmetad setup
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> All,
>
> I am having issues getting a dual gmetad env up and running. Here
> is the problem. I have one gmetad node (gmetad_node1) checking a
> single cluster of 1 machine. This node works fine, rrds are being
> created and when I place a UI ontop of it all is well. The trouble I
> am having is with my second gmetad node (gmetad_node2). I want this
> node to pull all its data from gmetad_node1 and store a copy of all
> rrds on its file system as well. I have turned off the "scalabe"
> option in gmetad.conf , and it starts to collect the first round of
> data but dies shortly after writing rrds. I have included a log of
> gmetad_node2 start up with debug at 10.
>
> any help on this issue would be appreciated.
>
> Regards,
> Brad Anderson
>
>
> gmetad_node1:
> - CentOS 4.4
> - ganglia-gmetad-3.0.6-1
> - ganglia-web-3.0.6-1
> - monitoring a single cluster of 1 machine
> - writes rrds localy to disk
>
>
> gmetad_node2:
> - CentOS 4.4
> - ganglia-gmetad-3.0.6-1
> - ganglia-web-3.0.6-1
> - scalable off
> - single data_source of gmetad_node1
>
>
> gmetad_node2 startup debug log:
> /etc/init.d/gmetad restart
> Shutting down GANGLIA gmetad: [FAILED]
> Starting GANGLIA gmetad: Going to run as user nobody
> Sources are ...
> Source: [grid1, step 30] has 1 sources
> 10.0.0.1
> xml listening on port 8651
> interactive xml listening on port 8652
> Data thread -1271247952 is monitoring [grid1] data source
> 10.0.0.1
> cleanup thread has been started
> [grid1] is a 2.5 or later data stream
> hash_create size = 1024
> hash->size is 1031
> hash_create size = 50
> hash->size is 53
> hash_create size = 50
> hash->size is 53
> Updating host host1.domain.com, metric disk_free
> Updating host host1.domain.com, metric bytes_out
> Updating host host1.domain.com, metric proc_total
> Updating host host1.domain.com, metric pkts_in
> Updating host host1.domain.com, metric cpu_nice
> Updating host host1.domain.com, metric cpu_speed
> Updating host host1.domain.com, metric boottime
> Updating host host1.domain.com, metric qmail_msgs_to_be_preprocessed
> Updating host host1.domain.com, metric cpu_wio
> Updating host host1.domain.com, metric qmail_msgs_in_queue
> Updating host host1.domain.com, metric load_one
> Updating host host1.domain.com, metric disk_total
> Updating host host1.domain.com, metric cpu_idle
> Updating host host1.domain.com, metric cpu_user
> Updating host host1.domain.com, metric swap_free
> Updating host host1.domain.com, metric mem_cached
> Updating host host1.domain.com, metric pkts_out
> Updating host host1.domain.com, metric load_five
> Updating host host1.domain.com, metric cpu_num
> Updating host host1.domain.com, metric load_fifteen
> Updating host host1.domain.com, metric mem_free
> Updating host host1.domain.com, metric cpu_system
> Updating host host1.domain.com, metric proc_run
> Updating host host1.domain.com, metric mem_total
> Updating host host1.domain.com, metric cpu_aidle
> Updating host host1.domain.com, metric bytes_in
> Updating host host1.domain.com, metric mem_buffers
> Updating host host1.domain.com, metric mem_shared
> Updating host host1.domain.com, metric swap_total
> Updating host host1.domain.com, metric part_max_used
> Writing Summary data for source Servers, metric disk_free
> Writing Summary data for source Servers, metric bytes_out
> Writing Summary data for source Servers, metric proc_total
> Writing Summary data for source Servers, metric cpu_nice
> Writing Summary data for source Servers, metric pkts_in
> Writing Summary data for source Servers, metric cpu_speed
> Writing Summary data for source Servers, metric boottime
> Writing Summary data for source Servers, metric
> qmail_msgs_to_be_preprocessed
> Writing Summary data for source Servers, metric cpu_wio
> Writing Summary data for source Servers, metric qmail_msgs_in_queue
> Writing Summary data for source Servers, metric load_one
> Writing Summary data for source Servers, metric disk_total
> Writing Summary data for source Servers, metric cpu_user
> Writing Summary data for source Servers, metric cpu_idle
> Writing Summary data for source Servers, metric swap_free
> Writing Summary data for source Servers, metric pkts_out
> Writing Summary data for source Servers, metric mem_cached
> Writing Summary data for source Servers, metric load_five
> Writing Summary data for source Servers, metric cpu_num
> Writing Summary data for source Servers, metric load_fifteen
> Writing Summary data for source Servers, metric mem_free
> Writing Summary data for source Servers, metric cpu_system
> Writing Summary data for source Servers, metric proc_run
> Writing Summary data for source Servers, metric mem_total
> Writing Summary data for source Servers, metric cpu_aidle
> Writing Summary data for source Servers, metric bytes_in
> Writing Summary data for source Servers, metric mem_buffers
> Writing Summary data for source Servers, metric mem_shared
> Writing Summary data for source Servers, metric swap_total
> Writing Summary data for source Servers, metric part_max_used
> [FAILED]
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFH0D/7qOVHpERMGj0RAgFdAJ9Opr4bGThQwqxza7EdUtmW0cShXgCbBDNS
> X9jO6tMkwKjcvnLlsNJy1J4=
> =ed0P
> -----END PGP SIGNATURE-----
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general