We saw the same problem occur when we blocked the root a/c and moved to a sudo environment. The sync_files would not work. Shiang-Tais solution is what we followed as well.
-Ibad Kureshi HPC Admin, University of Huddersfield ________________________________________ From: Shiang-Tai Lin [st...@ntu.edu.tw] Sent: Monday, May 28, 2012 11:57 PM To: oscar-users@lists.sourceforge.net Subject: Re: [Oscar-users] corrupt passwd files on nodes Hi Lutz, Our cluster had the same problem one or twice in the past. The password on nodes can be synchronized with that of the server with the following command /opt/sync_files/bin/sync_files This command is executed every 15 min via cron (cat /etc/crontab) */15 * * * * root env USER=root /opt/sync_files/bin/sync_files >/dev/null 2>&1 If something goes wrong during the sync process, the passwd file will be corrupted. In this case, we had to login the node with single user mode (so that the root password is not needed), and then copy the password files from the server. Regards, ST On 2012/5/28 下午 09:01, dr...@directbox.com<mailto:dr...@directbox.com> wrote: Hi all, I have now repeatedly encountered a problem and would like to know if it is a known / widespread one: time and again some (or all) of the nodes of one of our clusters become completely inaccessible (i.e. one cannot ssh or console-login to nodes). By rebooting a node from a live medium one finds that /etc/passwd has size 0; since I also find that /etc/groups and /etc/shadow have the same date, I assume that OSCAR has got some mechanism to distribute these files according to some schedule and that corruption can occur during the process of pushing those files down from the head node - am I right? Now my question is, how could one analyze, why the cluster does this and how could one fix it? Regards Dr Lutz Ackermann MMC - UL PS: It's an OSCAR 5 cluster installed on a RedHat derivative: $ cat /proc/version Linux version 2.6.9-78.ELsmp (brewbuil...@ls20-bc2-14.build.redhat.com<mailto:brewbuil...@ls20-bc2-14.build.redhat.com>) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-10)) #1 SMP Wed Jul 9 15:46:26 EDT 2008 $ cat /etc/*release Red Hat Enterprise Linux AS release 4 (Nahant Update 7) ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net<mailto:Oscar-users@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/oscar-users -- Shiang-Tai Lin, Professor Department of Chemical Engineering National Taiwan University TEL: +886-2-33661369 FAX: +886-2-23623040 Email: st...@ntu.edu.tw<mailto:st...@ntu.edu.tw> Webpate: http://web.che.ntu.edu.tw/stlin/ --- This transmission is confidential and may be legally privileged. If you receive it in error, please notify us immediately by e-mail and remove it from your system. If the content of this e-mail does not relate to the business of the University of Huddersfield, then we do not endorse it and will accept no liability. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users