On Feb 21, 2007, at 2:49 PM, Trach-Minh Tran wrote:
On 02/21/2007 06:42 PM, Sam Lang wrote:
On Feb 21, 2007, at 11:28 AM, Trach-Minh Tran wrote:
On 02/21/2007 06:10 PM, Sam Lang wrote:
On Feb 21, 2007, at 10:49 AM, Trach-Minh Tran wrote:
On 02/21/2007 05:18 PM, Sam Lang wrote:
Hi Minh,
I got the order of my AC_TRY_COMPILE arguments wrong. That
was pretty
sloppy on my part. I've attached a patch that should fix the
error
you're getting. I'm not sure it will apply cleanly to the
already
patched 2.6.2 source that you have. Better to start with a clean
2.6.2
tarball.
Hi Sam,
Thanks for you prompt response. I can now load the module. I
will do
some more tests with this 2.6.2 version. Until now, I've found
using my MPI-IO program, that this is not as stable as the 2.6.1
version:
During about 1/2 hour running the test, already 2 data servers
(out
of 8) have died!
That's surprising, the 2.6.2 release didn't include any changes
to the
servers from 2.6.1. Did you get any messages in the server logs
on the
nodes that died?
Do you think that I should stay with 2.6.1 + the misc-bug.patch
from
Murali?
There aren't any other significant fixes in 2.6.2 besides
support for
the latest Berkeley DB release, and the misc-bug patch that you
mention,
so using 2.6.1 shouldn't be a problem for you. That being said,
if the
servers crash for you on 2.6.2, its likely that they will do so
with
2.6.1 and you just haven't hit it yet. I'd also like to figure out
exactly what is causing the servers to crash. Can you send your
MPI-IO
program to us?
Hi Sam,
There is nothing in the server logs! May be tomorrow (it is now
6:30 pm
here) I will have more infos from the mpi-io runs I've just
submitted.
Rob thinks this might be related to the ROMIO ad_pvfs bug reported a
couple days ago, but the even so, corruption on the client shouldn't
cause the server's to segfault (esp. if the corruption is outside the
PVFS system interfaces). If possible, it would be great to get a
stack
trace from one of the crashed servers.
Hi Sam,
How can I get the stack strace ph the pvfs2 server when it dies?
I have run another series of tests with the mpi-io program for another
hour but the none of the servers died! I can add that when one of
the servers
previously died, I've got the following messages from my mpi program
while nothing appears in the pvfs2_server.log file:
Hi Minh,
To get the stack trace you need to configure pvfs with --enable-segv-
backtrace. This will cause the segfault to print a stack trace to
the log where the segfault occurs.
-sam
=====================================
[E 17:26:51.714686] msgpair failed, will retry: Broken pipe
[E 17:26:51.736877] handle_io_error: flow proto error cleanup
started on 0x6fd870, error_code: -1073741973
[E 17:26:51.737091] handle_io_error: flow proto 0x6fd870 canceled 0
operations, will clean up.
[E 17:26:51.737108] handle_io_error: flow proto 0x6fd870 error
cleanup finished, error_code: -1073741973
[E 17:26:53.734663] msgpair failed, will retry: Connection refused
[E 17:26:55.754647] msgpair failed, will retry: Connection refused
[E 17:26:57.774636] msgpair failed, will retry: Connection refused
[E 17:26:59.794622] msgpair failed, will retry: Connection refused
[E 17:27:01.814610] msgpair failed, will retry: Connection refused
[E 17:27:01.814651] *** msgpairarray_completion_fn: msgpair to
server tcp://io4:3334 failed: Connection refused
[E 17:27:01.814666] *** Out of retries.
=====================================
-Minh.
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users