On Feb 15, 2007, at 7:47 PM, Mark Van De Vyver wrote:
Hi,
Thank you for all the effort put into making PVFS2 available.
I'm relatively new to Linux (from WinXP), and have built a 3 node
cluster using the Rocks Cluster software v4.2.1. I've installed the
PVFS2 roll and by following the PVFS2 roll guide all has proceeded
very smoothly - really, thanks - I'd expected a few days/weeks to get
to this point.
At the end of this email I pose some questions that the following
behavior has raised.
About my set-up:
A single user. I made no changes to the PVFS configuration
established by the PVFS2 roll, and have one head node and two
compute-I/O nodes.
PVFS version 1.5.1
The unexpected behavior:
Using pvfs2-cp I have copied approx 900GB of files from serval DVD
using dd (I dd to a tmpfs area then pvfs2-cp this 'image' to
/mnt/pvfs2/some/path).
I have noticed that this runs fine so long as it is the first time the
file is copied. If I use pvfs2-rm to delete a file, not necessarily
from the same node used to make the copy, the following occurs (all
nodes seems to be up and working fine):
- I can see the file is removed using the gnome file browser.
- The pvfs2-rm seems to hang, and the hollowing message is displayed:
[E 15:10:02.584608] Job time out: cancelling bmi operation, job_id:
21.
[E 15:10:02.584769] msgpair failed, will retry: Operation cancelled
(possibly due to timeout)
Hi Mark,
It looks like the first failure with pvfs2-rm caused one of the
servers to crash, giving the appearance that pvfs2-rm was hanging.
It probably timed out at about 5 minutes or so? The error message is
that timeout.
If I try to re-copy the file (using pvfs2-cp), again, not necessarily
from the same node it was first copied on, then I see and the copy
fails.
[E 15:26:53.690560] Job time out: cancelling bmi operation, job_id:
25.
[E 15:26:53.690710] msgpair failed, will retry: Operation cancelled
(possibly due to timeout)
[E 15:26:53.690733] *** msgpairarray_completion_fn: msgpair to server
tcp://pvfs2-compute-0-1:3334 failed: Operation cancelled (possibly due
to timeout)
The failure here with pvfs2-cp at this point is also because the
server crashed in the previous pvfs2-rm.
[E 15:26:53.690743] *** No retries requested.
pvfs2-cp: src/client/sysint/sys-getattr.sm:331: getattr_acache_lookup:
Assertion `object_ref.handle != ((PVFS_handle)0)' failed.
/
This is a bug, when pvfs2-cp fails due to timeout, we shouldn't
assertion fail. I will look into this, although it may have already
been fixed since 1.5.1.
On rebooting one of the nodes I was forced to run fsck, after this the
cluster seems to have returned to 'normal'.
You can probably just restart the servers to get things back.
The good news is that the std linux commands: cp and rm don't seem to
have any trouble, so I am using those at the moment..... I couldn't
find any advice that cp, etc, is preferred to pvfs2-cp, or vice versa.
I think in general a lot more effort is made to get the kernel module
working properly than the client tools (pvfs2-*). That being said,
we don't discourage the use of the client tools, they just don't get
as much pounding, and they aren't written to match the functionality
that the VFS provides.
1) Is this a known issue that is fixed in PVFS 2.6?
The issue I think is why pvfs2-rm causes the server(s) to crash. If
possible, could you send us the logs of the servers? They should be
in /tmp/pvfs2-server.log.
2) Is it fine to continue to use v1.5.1 so long as I don't use the
PVFS-* commands?
Yes. There are known bugs in the 1.5.1 release, but they aren't
likely to cause any problems for what you're doing.
3) Is upgrading to v2.6 on a rocks cluster 'straight forward', or is
it likely to involve some 'debugging' and a few days work - bear in
mind my relative inexperience with Linux.
I've never installed Rocks so I'm going to have to let someone else
answer that. We pride ourselves on making PVFS easy to install and
deploy, and that hasn't changed in the newer releases.
-sam
Regards
Mark
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users