On Fri, Jan 11, 2008 at 04:57:16PM +0100, Bas van der Vlies wrote:
> Michael Barnes wrote:
> >Maui users,
> >
> Michael,
> 
>  Try the lastest snapshot of maui (maui-3.2.6p20-snap.1182974819). If a 
> remember it correct there is a bug in maui-3.2.6p19 a patch was not applied
> correctly and therefore you get a segv.
> 
> I am also running the lastest snapshot without any problems.

Maybe this is a Fedora Core 7 thing.  I just compiled and installed this
snapshot.  This is how I ran configure:

# these are the same flags that all of the FC7 RPMs use 

export CFLAGS="-D__M64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic"

cd maui-3.2.6p20/

./configure --prefix=/usr/local


And it ran 2 jobs, and now its acting funny.

Sometimes jobs will run, sometimes not.

I also get this:

checkjob 169101.pbsold
ERROR:    lost connection to server
ERROR:    cannot request service (status)


Same with:

showq
ERROR:    lost connection to server
ERROR:    cannot request service (status)


I do an strace on the running maui process and I see:

select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource 
temporarily unavailable)
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource 
temporarily unavailable)
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource 
temporarily unavailable)
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource 
temporarily unavailable)
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource 
temporarily unavailable)

over and over again.

An strace on the client command says this many times (as root and me):

bind(6, {sa_family=AF_INET, sin_port=htons(831), 
sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied)


I see nothing similar to the working version (meaning there is no bind()
call).



I don't know what else to try besides reinstalling the OS in 32bit mode,
which is not a big deal.  But if anybody has any suggestions, I'm open
to them.

Another piece of information is that I am running the pbs_server in
debug mode, but AFAIK, this only keeps it from forking and it dumps out
some stuff on the terminal.



I don't know what more to try.



-mb



-- 
+-----------------------------------------------
| Michael Barnes
|
| Thomas Jefferson National Accelerator Facility
| 12000 Jefferson Ave.
| Newport News, VA 23606
| (757) 269-7634
+-----------------------------------------------
_______________________________________________
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to