Partik,
Is the call stack the same for both mpi-processes ?
Utkarsh
On Sat, May 28, 2011 at 5:00 PM, pratik pratik.mal...@gmail.com wrote:
Hi Utkarsh,
this is what i get:
(gdb) where
#0 0x2b58ed6e41f0 in __nanosleep_nocancel () from
/lib64/libpthread.so.0
#1 0x2b58e5b2f8c8 in
Hi Utkarsh,
Please excuse me on this, but the cluster that I was working on will not
be available for me for the next
week or so. As soon as it is back up, i will test this and send it to
you (most likely by saturday/sunday)
Thanks,
pratik
On Monday 30 May 2011 07:01 PM, Utkarsh Ayachit
Sounds good. Thanks.
If the call stack is same on both processes, it may be that
MPI_Finalize is freezing. Then we can try to track down why
MPI_Finalize would freeze and that would help us get to the root
cause of the issue.
Utkarsh
On Mon, May 30, 2011 at 10:00 AM, pratik
Once the pvserver process has hanged, connect to any one of the processes with
gdb. Use gdb --pid= to attach to a particular process. Then type where and
that will give you the call stack. Do this for both the processes and we'd know
where they ate stuck.
Utkarsh
On May 27, 2011, at 10:58
Hi Utkarsh,
this is what i get:
(gdb) where
#0 0x2b58ed6e41f0 in __nanosleep_nocancel () from
/lib64/libpthread.so.0
#1 0x2b58e5b2f8c8 in MPI_SGI_millisleep (milliseconds=value
optimized out) at sleep.c:36
#2 0x2b58e5b27c2c in MPI_SGI_slow_request_wait
(request=0x7fffc53bdb5c,
pvserver is designed to quit after the client disconnects. Does this
show up after a particular operations or always? Is this a debug
build? Can you post the stack track for the place where the server is
wainting after client quits?
On Fri, May 27, 2011 at 12:16 AM, pratik
Hi Utkarsh,
This problem always shows up, even if i do not load any data.
The build is: CMAKE_BUILD_TYPE=DEBUG (PV 3.10.1, both server and client)
Can you please tell me how I can do a stack trace of pvserver?
best,
pratik
On Friday 27 May 2011 06:51 PM, Utkarsh Ayachit wrote:
pvserver is
Also,
1) pvserver disconnects properly when run on head node standalone
2) pvserver disconnects properly when run on *one* node via PBS job
scheduler.
problem arises when i try to use more than one node in the cluster.
thanks,
pratik
On Friday 27 May 2011 08:25 PM, pratik wrote:
Hi Utkarsh,
Hi Guys,
I have noticed this as well. I have been dealing with it by using qdel
to make sure the pvserver gets shut down and allocation doesn't get
wasted. I thought this happened because I used ncat to forward a port,
so I never looked into it.
I've used a bash script to submit the
Hi everyone,
I have pvserver connecting to my client through reverse connection.
However, after i disconnect from the gui, the job (submitted through
PBSPro 10.6) continues to run, and i checked and saw that pvserver was
still running on the nodes. Is there any option by which the pbs job
10 matches
Mail list logo