On Mon, 2004-06-21 at 12:36, steven wagner wrote: > A few other things exploded so I've only just had the chance to check this > out. Info: > > I'm not using a config file at this time. This is a Redhat 7.1 uniprocessor > P4 but I get the same results on a dual-proc Opteron running the same > bastardized RH7.1-derivative. > > When built from the 2.5.6 tarball, the monitoring core works outside of debug > mode.
do you mean that if you turn on debugging that gmond doesn't work anymore? > When built from *this* 2.6.0 tarball, not so much ... works in debug though: > > 644383 Jun 3 12:55 ganglia-2.6.0.tar.gz and when you build the previous snapshot that gmond ONLY works in debug mode? > gdb the happy elf provides this traceback on the gmond threads which *are* > created: > > #0 0x40084b85 in __sigsuspend (set=0xbffff200) > at ../sysdeps/unix/sysv/linux/sigsuspend.c:45 > #1 0x401ad1c9 in __pthread_wait_for_restart_signal (self=0x401b5f40) > at pthread.c:969 > #2 0x401ad29c in __pthread_create_2_1 (thread=0x807be94, attr=0x0, > start_routine=0x80539f8 <schedule_thread>, arg=0x807be88) at restart.h:34 > #3 0x08053291 in tpool_init (tpoolp=0xbffff3a4, num_worker_threads=1, > max_queue_size=128, do_not_block_when_full=1) at tpool.c:100 > #4 0x08053338 in ganglia_thread_pool_create (num_worker_threads=1, > max_queue_size=128, do_not_block_when_full=1) at tpool.c:122 > #5 0x0804bc4b in main (argc=1, argv=0xbffff4c4) at gmond.c:254 > > Line 254 is: > > receive_pool = ganglia_thread_pool_create( > gmond_config.num_receive_channels, 128, 1 ); what version of glibc are you running on your boxes? % rpm -qi glibc i found that the pthread (LinuxThreads) implementation on linux is a nightmare. sometimes you'll find the thread stuff in glibc other times you'll find it in the kernel. you can force it to use older pthread libraries by doing a... % set LD_ASSUME_KERNEL="2.2.5" before you start gmond. also, you are compiling gmond on the host it being run on? i think the problem is the way that signals are passed around in threaded programs... older libraries used USR1 and USR2.. the newer libraries use "real-time" signals. we may have to remove the thread pool code altogether and just have a thread per channel (or put in a ./configure flag to override the pools on broken machines). your message is timely. i was going to send an email out today and try to get feedback on 2.6.0. so .. it looks like we have a trusted_hosts IPv4 <=> IPv6 problem and a thread pool problem. any others? -matt -- PGP fingerprint 'A7C2 3C2F 8445 AD3C 135E F40B 242A 5984 ACBC 91D3' They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. --Benjamin Franklin, Historical Review of Pennsylvania, 1759
signature.asc
Description: This is a digitally signed message part