On Feb 15, 2007, at 7:47 PM, Mark Van De Vyver wrote:
> Hi,
> Thank you for all the effort put into making PVFS2 available.
> I'm relatively new to Linux (from WinXP), and have built a 3 node
> cluster using the Rocks Cluster software v4.2.1. I've
installed the
> PVFS2 roll and by following the PVFS2 roll guide all has proceeded
> very smoothly - really, thanks - I'd expected a few days/weeks
to get
> to this point.
>
> At the end of this email I pose some questions that the following
> behavior has raised.
>
> About my set-up:
> A single user. I made no changes to the PVFS configuration
> established by the PVFS2 roll, and have one head node and two
> compute-I/O nodes.
> PVFS version 1.5.1
>
> The unexpected behavior:
> Using pvfs2-cp I have copied approx 900GB of files from serval DVD
> using dd (I dd to a tmpfs area then pvfs2-cp this 'image' to
> /mnt/pvfs2/some/path).
> I have noticed that this runs fine so long as it is the first
time the
> file is copied. If I use pvfs2-rm to delete a file, not
necessarily
> from the same node used to make the copy, the following occurs (all
> nodes seems to be up and working fine):
> - I can see the file is removed using the gnome file browser.
> - The pvfs2-rm seems to hang, and the hollowing message is
displayed:
>
> [E 15:10:02.584608] Job time out: cancelling bmi operation, job_id:
> 21.
> [E 15:10:02.584769] msgpair failed, will retry: Operation cancelled
> (possibly due to timeout)
>
Hi Mark,
It looks like the first failure with pvfs2-rm caused one of the
servers to crash, giving the appearance that pvfs2-rm was hanging.
It probably timed out at about 5 minutes or so? The error message is
that timeout.
> If I try to re-copy the file (using pvfs2-cp), again, not
necessarily
> from the same node it was first copied on, then I see and the copy
> fails.
>
> [E 15:26:53.690560] Job time out: cancelling bmi operation, job_id:
> 25.
> [E 15:26:53.690710] msgpair failed, will retry: Operation cancelled
> (possibly due to timeout)
> [E 15:26:53.690733] *** msgpairarray_completion_fn: msgpair to
server
> tcp://pvfs2-compute-0-1:3334 failed: Operation cancelled
(possibly due
> to timeout)
The failure here with pvfs2-cp at this point is also because the
server crashed in the previous pvfs2-rm.
> [E 15:26:53.690743] *** No retries requested.
> pvfs2-cp: src/client/sysint/sys-getattr.sm:331:
getattr_acache_lookup:
> Assertion `object_ref.handle != ((PVFS_handle)0)' failed.
> /
>
This is a bug, when pvfs2-cp fails due to timeout, we shouldn't
assertion fail. I will look into this, although it may have already
been fixed since 1.5.1.
> On rebooting one of the nodes I was forced to run fsck, after
this the
> cluster seems to have returned to 'normal'.
You can probably just restart the servers to get things back.
>
> The good news is that the std linux commands: cp and rm don't
seem to
> have any trouble, so I am using those at the moment..... I couldn't
> find any advice that cp, etc, is preferred to pvfs2-cp, or vice
versa.
I think in general a lot more effort is made to get the kernel module
working properly than the client tools (pvfs2-*). That being said,
we don't discourage the use of the client tools, they just don't get
as much pounding, and they aren't written to match the functionality
that the VFS provides.
>
> 1) Is this a known issue that is fixed in PVFS 2.6?
The issue I think is why pvfs2-rm causes the server(s) to crash. If
possible, could you send us the logs of the servers? They should be
in /tmp/pvfs2-server.log.
> 2) Is it fine to continue to use v1.5.1 so long as I don't use the
> PVFS-* commands?
Yes. There are known bugs in the 1.5.1 release, but they aren't
likely to cause any problems for what you're doing.
> 3) Is upgrading to v2.6 on a rocks cluster 'straight forward',
or is
> it likely to involve some 'debugging' and a few days work - bear in
> mind my relative inexperience with Linux.
I've never installed Rocks so I'm going to have to let someone else
answer that. We pride ourselves on making PVFS easy to install and
deploy, and that hasn't changed in the newer releases.
-sam
>
> Regards
> Mark
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
<pvfs2-client.log.frontend>
<pvfs2-client.log.compute-0-0>
<pvfs2-client.log.compute-0-1>