Michael Barnes wrote:
On Fri, Jan 11, 2008 at 04:57:16PM +0100, Bas van der Vlies wrote:
Michael Barnes wrote:
Maui users,

Michael,

 Try the lastest snapshot of maui (maui-3.2.6p20-snap.1182974819). If a
remember it correct there is a bug in maui-3.2.6p19 a patch was not applied
correctly and therefore you get a segv.

I am also running the lastest snapshot without any problems.

Maybe this is a Fedora Core 7 thing.  I just compiled and installed this
snapshot.  This is how I ran configure:

# these are the same flags that all of the FC7 RPMs use

export CFLAGS="-D__M64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic"

cd maui-3.2.6p20/

./configure --prefix=/usr/local


And it ran 2 jobs, and now its acting funny.

Sometimes jobs will run, sometimes not.

I also get this:

checkjob 169101.pbsold
ERROR:    lost connection to server
ERROR:    cannot request service (status)


Same with:

showq
ERROR:    lost connection to server
ERROR:    cannot request service (status)


I do an strace on the running maui process and I see:

select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource 
temporarily unavailable)
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource 
temporarily unavailable)
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource 
temporarily unavailable)
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource 
temporarily unavailable)
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource 
temporarily unavailable)

over and over again.

An strace on the client command says this many times (as root and me):

bind(6, {sa_family=AF_INET, sin_port=htons(831), 
sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied)


I see nothing similar to the working version (meaning there is no bind()
call).



I don't know what else to try besides reinstalling the OS in 32bit mode,
which is not a big deal.  But if anybody has any suggestions, I'm open
to them.

Another piece of information is that I am running the pbs_server in
debug mode, but AFAIK, this only keeps it from forking and it dumps out
some stuff on the terminal.



I don't know what more to try.



-mb



--
+-----------------------------------------------
| Michael Barnes
|
| Thomas Jefferson National Accelerator Facility
| 12000 Jefferson Ave.
| Newport News, VA 23606
| (757) 269-7634
+-----------------------------------------------

Did you try to set the loglevel to 9 and check the maui.log for error messages. All tools also the client tools (diagnose, checkjob, ...) are communicating with the server. So if the server (maui) crashes nothing works anymore.


It seems like maui closes the socket, maybe ipv6 related or something. Just a guess

Regards


--
--
********************************************************************
*                                                                  *
*  Bas van der Vlies                     e-mail: [EMAIL PROTECTED]      *
*  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
*  Kruislaan 415                         fax:    +31 20 6683167    *
*  1098 SJ Amsterdam                                               *
*                                                                  *
********************************************************************
_______________________________________________
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to