On Fri, Jan 11, 2008 at 04:57:16PM +0100, Bas van der Vlies wrote: > Michael Barnes wrote: > >Maui users, > > > Michael, > > Try the lastest snapshot of maui (maui-3.2.6p20-snap.1182974819). If a > remember it correct there is a bug in maui-3.2.6p19 a patch was not applied > correctly and therefore you get a segv. > > I am also running the lastest snapshot without any problems.
Maybe this is a Fedora Core 7 thing. I just compiled and installed this snapshot. This is how I ran configure: # these are the same flags that all of the FC7 RPMs use export CFLAGS="-D__M64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic" cd maui-3.2.6p20/ ./configure --prefix=/usr/local And it ran 2 jobs, and now its acting funny. Sometimes jobs will run, sometimes not. I also get this: checkjob 169101.pbsold ERROR: lost connection to server ERROR: cannot request service (status) Same with: showq ERROR: lost connection to server ERROR: cannot request service (status) I do an strace on the running maui process and I see: select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout) accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable) select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout) accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable) select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout) accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable) select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout) accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable) select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout) accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable) over and over again. An strace on the client command says this many times (as root and me): bind(6, {sa_family=AF_INET, sin_port=htons(831), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied) I see nothing similar to the working version (meaning there is no bind() call). I don't know what else to try besides reinstalling the OS in 32bit mode, which is not a big deal. But if anybody has any suggestions, I'm open to them. Another piece of information is that I am running the pbs_server in debug mode, but AFAIK, this only keeps it from forking and it dumps out some stuff on the terminal. I don't know what more to try. -mb -- +----------------------------------------------- | Michael Barnes | | Thomas Jefferson National Accelerator Facility | 12000 Jefferson Ave. | Newport News, VA 23606 | (757) 269-7634 +----------------------------------------------- _______________________________________________ mauiusers mailing list mauiusers@supercluster.org http://www.supercluster.org/mailman/listinfo/mauiusers