Hi all: More or less since I've installed pvfs2, I've had recurring stability issues. Presently, my cluster headnode has 3 processes, each using 100% of a core, that are "hung" on I/O (all of that processor usage is in "system", not "user"), but the process is not in "D" state (its moving between S and R). The process should have completed in an hour or less, its now been running for over 18 hours. It also is not responding to kills (including kill -9). From the sounds of the users' message, any additional processes started in the same working directory will hang in the same way.
This happens a lot. Presently, the 3 hung processes are a binary specific to the research (x2) and gzip; often, the hung processes are ls and ssh (for scp), etc. When this happens, all other physical systems are still fully functional. This has happened repeatedly (although not repeatable on demand) on versions 1.5 through 1.8.1. The only recovery option I have found to date is to reboot the system. This normally only happens on the head node, but the head node is also where a lot of the user I/O takes place (especially a lot of small I/O accesses such as a few scp sessions, some gzips, and 5-10 users doing ls, mv, and cp operations). Given what I understand about pvfs2's current user base, I'd think it must be stable; a large cluster could never run pvfs2 and still be useful to users with the types of instability I keep experiencing. As such, I suspect the problem is somewhere with my system/setup, but to date pcarns and others on #pvfs2 have not been able to identify what it is. These stability issues are significantly effecting the usability of the cluster, and of course, beginning to deter users from it, and/or my competency in administrating it. Yet from what I can tell, I'm experiencing some bug in the pvfs kernel module. I'd really like to get this problem fixed, and I'm at a loss of how, other than replacing pvfs2 with some other filesystem, which I'd rather not do. How do I fix this problem without replacing pvfs2? --Jim _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
