swift 08/05/19 20:56:20 Modified: hpc-howto.xml Log: Coding style (sorry, length on uris not fixable)
Revision Changes Path 1.14 xml/htdocs/doc/en/hpc-howto.xml file : http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/hpc-howto.xml?rev=1.14&view=markup plain: http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/hpc-howto.xml?rev=1.14&content-type=text/plain diff : http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/hpc-howto.xml?r1=1.13&r2=1.14 Index: hpc-howto.xml =================================================================== RCS file: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v retrieving revision 1.13 retrieving revision 1.14 diff -u -r1.13 -r1.14 --- hpc-howto.xml 18 Dec 2006 21:47:19 -0000 1.13 +++ hpc-howto.xml 19 May 2008 20:56:20 -0000 1.14 @@ -1,5 +1,5 @@ <?xml version='1.0' encoding="UTF-8"?> -<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.13 2006/12/18 21:47:19 nightmorph Exp $ --> +<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/hpc-howto.xml,v 1.14 2008/05/19 20:56:20 swift Exp $ --> <!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> <guide link="/doc/en/hpc-howto.xml"> @@ -28,7 +28,7 @@ permission to distribute this document as-is and update it when appropriate as long as the adelie linux R&D notice stays --> - + <abstract> This document was written by people at the Adelie Linux R&D Center <http://www.adelielinux.com> as a step-by-step guide to turn a Gentoo @@ -44,22 +44,22 @@ <body> <p> -Gentoo Linux, a special flavor of Linux that can be automatically optimized -and customized for just about any application or need. Extreme performance, +Gentoo Linux, a special flavor of Linux that can be automatically optimized +and customized for just about any application or need. Extreme performance, configurability and a top-notch user and developer community are all hallmarks of the Gentoo experience. </p> <p> -Thanks to a technology called Portage, Gentoo Linux can become an ideal secure +Thanks to a technology called Portage, Gentoo Linux can become an ideal secure server, development workstation, professional desktop, gaming system, embedded -solution or... a High Performance Computing system. Because of its +solution or... a High Performance Computing system. Because of its near-unlimited adaptability, we call Gentoo Linux a metadistribution. </p> <p> -This document explains how to turn a Gentoo system into a High Performance -Computing system. Step by step, it explains what packages one may want to +This document explains how to turn a Gentoo system into a High Performance +Computing system. Step by step, it explains what packages one may want to install and helps configure them. </p> @@ -86,10 +86,10 @@ <p> During the installation process, you will have to set your USE variables in -<path>/etc/make.conf</path>. We recommended that you deactivate all the +<path>/etc/make.conf</path>. We recommended that you deactivate all the defaults (see <path>/etc/make.profile/make.defaults</path>) by negating them in -make.conf. However, you may want to keep such use variables as x86, 3dnow, gpm, -mmx, nptl, nptlonly, sse, ncurses, pam and tcpd. Refer to the USE documentation +make.conf. However, you may want to keep such use variables as x86, 3dnow, gpm, +mmx, nptl, nptlonly, sse, ncurses, pam and tcpd. Refer to the USE documentation for more information. </p> @@ -114,8 +114,8 @@ </note> <p> -In step 15 ("Installing the kernel and a System Logger") for stability -reasons, we recommend the vanilla-sources, the official kernel sources +In step 15 ("Installing the kernel and a System Logger") for stability +reasons, we recommend the vanilla-sources, the official kernel sources released on <uri>http://www.kernel.org/</uri>, unless you require special support such as xfs. </p> @@ -125,7 +125,7 @@ </pre> <p> -When you install miscellaneous packages, we recommend installing the +When you install miscellaneous packages, we recommend installing the following: </p> @@ -140,35 +140,35 @@ <body> <p> -A cluster requires a communication layer to interconnect the slave nodes to -the master node. Typically, a FastEthernet or GigaEthernet LAN can be used -since they have a good price/performance ratio. Other possibilities include -use of products like <uri link="http://www.myricom.com/">Myrinet</uri>, <uri +A cluster requires a communication layer to interconnect the slave nodes to +the master node. Typically, a FastEthernet or GigaEthernet LAN can be used +since they have a good price/performance ratio. Other possibilities include +use of products like <uri link="http://www.myricom.com/">Myrinet</uri>, <uri link="http://quadrics.com/">QsNet</uri> or others. </p> <p> -A cluster is composed of two node types: master and slave. Typically, your +A cluster is composed of two node types: master and slave. Typically, your cluster will have one master node and several slave nodes. </p> <p> -The master node is the cluster's server. It is responsible for telling the -slave nodes what to do. This server will typically run such daemons as dhcpd, -nfs, pbs-server, and pbs-sched. Your master node will allow interactive +The master node is the cluster's server. It is responsible for telling the +slave nodes what to do. This server will typically run such daemons as dhcpd, +nfs, pbs-server, and pbs-sched. Your master node will allow interactive sessions for users, and accept job executions. </p> <p> -The slave nodes listen for instructions (via ssh/rsh perhaps) from the master -node. They should be dedicated to crunching results and therefore should not +The slave nodes listen for instructions (via ssh/rsh perhaps) from the master +node. They should be dedicated to crunching results and therefore should not run any unnecessary services. </p> <p> -The rest of this documentation will assume a cluster configuration as per the -hosts file below. You should maintain on every node such a hosts file -(<path>/etc/hosts</path>) with entries for each node participating node in the +The rest of this documentation will assume a cluster configuration as per the +hosts file below. You should maintain on every node such a hosts file +(<path>/etc/hosts</path>) with entries for each node participating node in the cluster. </p> @@ -185,7 +185,7 @@ </pre> <p> -To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path> +To setup your cluster dedicated LAN, edit your <path>/etc/conf.d/net</path> file on the master node. </p> @@ -202,7 +202,7 @@ <p> -Finally, setup a DHCP daemon on the master node to avoid having to maintain a +Finally, setup a DHCP daemon on the master node to avoid having to maintain a network configuration on each slave node. </p> @@ -239,22 +239,22 @@ <body> <p> -The Network File System (NFS) was developed to allow machines to mount a disk +The Network File System (NFS) was developed to allow machines to mount a disk partition on a remote machine as if it were on a local hard drive. This allows for fast, seamless sharing of files across a network. </p> <p> There are other systems that provide similar functionality to NFS which could -be used in a cluster environment. The <uri -link="http://www.openafs.org">Andrew File System -from IBM</uri>, recently open-sourced, provides a file sharing mechanism with -some additional security and performance features. The <uri -link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in -development, but is designed to work well with disconnected clients. Many +be used in a cluster environment. The <uri +link="http://www.openafs.org">Andrew File System +from IBM</uri>, recently open-sourced, provides a file sharing mechanism with +some additional security and performance features. The <uri +link="http://www.coda.cs.cmu.edu/">Coda File System</uri> is still in +development, but is designed to work well with disconnected clients. Many of the features of the Andrew and Coda file systems are slated for inclusion in the next version of <uri link="http://www.nfsv4.org">NFS (Version 4)</uri>. -The advantage of NFS today is that it is mature, standard, well understood, +The advantage of NFS today is that it is mature, standard, well understood, and supported robustly across a variety of platforms. </p> @@ -277,8 +277,8 @@ </pre> <p> -On the master node, edit your <path>/etc/hosts.allow</path> file to allow -connections from slave nodes. If your cluster LAN is on 192.168.1.0/24, +On the master node, edit your <path>/etc/hosts.allow</path> file to allow +connections from slave nodes. If your cluster LAN is on 192.168.1.0/24, your <path>hosts.allow</path> will look like: </p> @@ -287,7 +287,7 @@ </pre> <p> -Edit the <path>/etc/exports</path> file of the master node to export a work +Edit the <path>/etc/exports</path> file of the master node to export a work directory structure (/home is good for this). </p> @@ -304,8 +304,8 @@ </pre> <p> -To mount the nfs exported filesystem from the master, you also have to -configure your salve nodes' <path>/etc/fstab</path>. Add a line like this +To mount the nfs exported filesystem from the master, you also have to +configure your salve nodes' <path>/etc/fstab</path>. Add a line like this one: </p> @@ -314,7 +314,7 @@ </pre> <p> -You'll also need to set up your nodes so that they mount the nfs filesystem by +You'll also need to set up your nodes so that they mount the nfs filesystem by issuing this command: </p> @@ -329,15 +329,15 @@ <body> <p> -SSH is a protocol for secure remote login and other secure network services -over an insecure network. OpenSSH uses public key cryptography to provide -secure authorization. Generating the public key, which is shared with remote -systems, and the private key which is kept on the local system, is done first +SSH is a protocol for secure remote login and other secure network services +over an insecure network. OpenSSH uses public key cryptography to provide +secure authorization. Generating the public key, which is shared with remote +systems, and the private key which is kept on the local system, is done first to configure OpenSSH on the cluster. </p> <p> -For transparent cluster usage, private/public keys may be used. This process +For transparent cluster usage, private/public keys may be used. This process has two steps: </p> @@ -374,12 +374,12 @@ </pre> <note> -Host keys must have an empty passphrase. RSA is required for host-based +Host keys must have an empty passphrase. RSA is required for host-based authentication. </note> <p> -For host based authentication, you will also need to edit your +For host based authentication, you will also need to edit your <path>/etc/ssh/shosts.equiv</path>. </p> @@ -397,7 +397,7 @@ # $OpenBSD: sshd_config,v 1.42 2001/09/20 20:57:51 mouring Exp $ # This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin -# This is the sshd server system-wide configuration file. See sshd(8) +# This is the sshd server system-wide configuration file. See sshd(8) # for more information. # HostKeys for protocol version 2 @@ -405,7 +405,7 @@ </pre> <p> -If your application require RSH communications, you will need to emerge +If your application require RSH communications, you will need to emerge net-misc/netkit-rsh and sys-apps/xinetd. </p> @@ -417,7 +417,7 @@ </pre> <p> -Then configure the rsh deamon. Edit your <path>/etc/xinet.d/rsh</path> file. +Then configure the rsh deamon. Edit your <path>/etc/xinet.d/rsh</path> file. </p> <pre caption="rsh"> @@ -456,7 +456,7 @@ <pre caption="hosts.allow"> # Adelie Linux Research & Development Center -# /etc/hosts.allow +# /etc/hosts.allow ALL:192.168.1.0/255.255.255.0 </pre> @@ -489,20 +489,20 @@ <body> <p> -The Network Time Protocol (NTP) is used to synchronize the time of a computer -client or server to another server or reference time source, such as a radio -or satellite receiver or modem. It provides accuracies typically within a -millisecond on LANs and up to a few tens of milliseconds on WANs relative to -Coordinated Universal Time (UTC) via a Global Positioning Service (GPS) +The Network Time Protocol (NTP) is used to synchronize the time of a computer +client or server to another server or reference time source, such as a radio +or satellite receiver or modem. It provides accuracies typically within a +millisecond on LANs and up to a few tens of milliseconds on WANs relative to +Coordinated Universal Time (UTC) via a Global Positioning Service (GPS) receiver, for example. Typical NTP configurations utilize multiple redundant -servers and diverse network paths in order to achieve high accuracy and +servers and diverse network paths in order to achieve high accuracy and reliability. </p> <p> -Select a NTP server geographically close to you from <uri -link="http://www.eecis.udel.edu/~mills/ntp/servers.html">Public NTP Time -Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and +Select a NTP server geographically close to you from <uri +link="http://www.eecis.udel.edu/~mills/ntp/servers.html">Public NTP Time +Servers</uri>, and configure your <path>/etc/conf.d/ntp</path> and <path>/etc/ntp.conf</path> files on the master node. </p> @@ -549,7 +549,7 @@ </pre> <p> -Edit your <path>/etc/ntp.conf</path> file on the master to setup an external +Edit your <path>/etc/ntp.conf</path> file on the master to setup an external synchronization source: </p> @@ -565,7 +565,7 @@ restrict ntp2.cmc.ec.gc.ca stratum 10 driftfile /etc/ntp.drift.server -logfile /var/log/ntp +logfile /var/log/ntp broadcast 192.168.1.255 restrict default kod restrict 127.0.0.1 @@ -573,7 +573,7 @@ </pre> <p> -And on all your slave nodes, setup your synchronization source as your master +And on all your slave nodes, setup your synchronization source as your master node. </p> @@ -594,7 +594,7 @@ restrict master stratum 11 driftfile /etc/ntp.drift.server -logfile /var/log/ntp +logfile /var/log/ntp restrict default kod restrict 127.0.0.1 </pre> @@ -608,7 +608,7 @@ </pre> <note> -NTP will not update the local clock if the time difference between your +NTP will not update the local clock if the time difference between your synchronization source and the local clock is too great. </note> @@ -691,10 +691,10 @@ <body> <p> -The Portable Batch System (PBS) is a flexible batch queueing and workload +The Portable Batch System (PBS) is a flexible batch queueing and workload management system originally developed for NASA. It operates on networked, -multi-platform UNIX environments, including heterogeneous clusters of -workstations, supercomputers, and massively parallel systems. Development of +multi-platform UNIX environments, including heterogeneous clusters of +workstations, supercomputers, and massively parallel systems. Development of PBS is provided by Altair Grid Technologies. </p> @@ -703,12 +703,12 @@ </pre> <note> -OpenPBS ebuild does not currently set proper permissions on var-directories +OpenPBS ebuild does not currently set proper permissions on var-directories used by OpenPBS. </note> <p> -Before starting using OpenPBS, some configurations are required. The files +Before starting using OpenPBS, some configurations are required. The files you will need to personalize for your system are: </p> @@ -762,10 +762,10 @@ </pre> <p> -To submit a task to OpenPBS, the command <c>qsub</c> is used with some -optional parameters. In the example below, "-l" allows you to specify +To submit a task to OpenPBS, the command <c>qsub</c> is used with some +optional parameters. In the example below, "-l" allows you to specify the resources required, "-j" provides for redirection of standard out and -standard error, and the "-m" will e-mail the user at beginning (b), end (e) +standard error, and the "-m" will e-mail the user at beginning (b), end (e) and on abort (a) of the job. </p> @@ -775,8 +775,8 @@ </pre> <p> -Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you -may want to try a task manually. To request an interactive shell from OpenPBS, +Normally jobs submitted to OpenPBS are in the form of scripts. Sometimes, you +may want to try a task manually. To request an interactive shell from OpenPBS, use the "-I" parameter. </p> @@ -802,16 +802,16 @@ <body> <p> -Message passing is a paradigm used widely on certain classes of parallel -machines, especially those with distributed memory. MPICH is a freely -available, portable implementation of MPI, the Standard for message-passing +Message passing is a paradigm used widely on certain classes of parallel +machines, especially those with distributed memory. MPICH is a freely +available, portable implementation of MPI, the Standard for message-passing libraries. </p> <p> -The mpich ebuild provided by Adelie Linux allows for two USE flags: -<e>doc</e> and <e>crypt</e>. <e>doc</e> will cause documentation to be -installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead +The mpich ebuild provided by Adelie Linux allows for two USE flags: +<e>doc</e> and <e>crypt</e>. <e>doc</e> will cause documentation to be +installed, while <e>crypt</e> will configure MPICH to use <c>ssh</c> instead of <c>rsh</c>. </p> @@ -821,7 +821,7 @@ </pre> <p> -You may need to export a mpich work directory to all your slave nodes in +You may need to export a mpich work directory to all your slave nodes in <path>/etc/exports</path>: </p> @@ -830,15 +830,15 @@ </pre> <p> -Most massively parallel processors (MPPs) provide a way to start a program on -a requested number of processors; <c>mpirun</c> makes use of the appropriate +Most massively parallel processors (MPPs) provide a way to start a program on +a requested number of processors; <c>mpirun</c> makes use of the appropriate command whenever possible. In contrast, workstation clusters require that each -process in a parallel job be started individually, though programs to help -start these processes exist. Because workstation clusters are not already -organized as an MPP, additional information is required to make use of them. -Mpich should be installed with a list of participating workstations in the -file <path>machines.LINUX</path> in the directory -<path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose +process in a parallel job be started individually, though programs to help +start these processes exist. Because workstation clusters are not already +organized as an MPP, additional information is required to make use of them. +Mpich should be installed with a list of participating workstations in the +file <path>machines.LINUX</path> in the directory +<path>/usr/share/mpich/</path>. This file is used by <c>mpirun</c> to choose processors to run on. </p> @@ -848,11 +848,11 @@ <pre caption="/usr/share/mpich/machines.LINUX"> # Change this file to contain the machines that you want to use -# to run MPI jobs on. The format is one host name per line, with either +# to run MPI jobs on. The format is one host name per line, with either # hostname # or # hostname:n -# where n is the number of processors in an SMP. The hostname should +# where n is the number of processors in an SMP. The hostname should # be the same as the result from the command "hostname" master node01 @@ -863,18 +863,18 @@ </pre> <p> -Use the script <c>tstmachines</c> in <path>/usr/sbin/</path> to ensure that -you can use all of the machines that you have listed. This script performs -an <c>rsh</c> and a short directory listing; this tests that you both have -access to the node and that a program in the current directory is visible on -the remote node. If there are any problems, they will be listed. These +Use the script <c>tstmachines</c> in <path>/usr/sbin/</path> to ensure that +you can use all of the machines that you have listed. This script performs +an <c>rsh</c> and a short directory listing; this tests that you both have +access to the node and that a program in the current directory is visible on +the remote node. If there are any problems, they will be listed. These problems must be fixed before proceeding. </p> <p> -The only argument to <c>tstmachines</c> is the name of the architecture; this -is the same name as the extension on the machines file. For example, the -following tests that a program in the current directory can be executed by +The only argument to <c>tstmachines</c> is the name of the architecture; this +is the same name as the extension on the machines file. For example, the +following tests that a program in the current directory can be executed by all of the machines in the LINUX machines list. </p> @@ -883,7 +883,7 @@ </pre> <note> -This program is silent if all is well; if you want to see what it is doing, +This program is silent if all is well; if you want to see what it is doing, use the -v (for verbose) argument: </note> @@ -905,24 +905,24 @@ </pre> <p> -If <c>tstmachines</c> finds a problem, it will suggest possible reasons and +If <c>tstmachines</c> finds a problem, it will suggest possible reasons and solutions. In brief, there are three tests: </p> <ul> <li> - <e>Can processes be started on remote machines?</e> tstmachines attempts - to run the shell command true on each machine in the machines files by + <e>Can processes be started on remote machines?</e> tstmachines attempts + to run the shell command true on each machine in the machines files by using the remote shell command. </li> <li> - <e>Is current working directory available to all machines?</e> This - attempts to ls a file that tstmachines creates by running ls using the + <e>Is current working directory available to all machines?</e> This + attempts to ls a file that tstmachines creates by running ls using the remote shell command. </li> <li> <e>Can user programs be run on remote systems?</e> This checks that shared - libraries and other components have been properly installed on all + libraries and other components have been properly installed on all machines. </li> </ul> @@ -939,7 +939,7 @@ </pre> <p> -For further information on MPICH, consult the documentation at <uri +For further information on MPICH, consult the documentation at <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm">http://www-unix.mcs.anl.gov/mpi/mpich/docs/mpichman-chp4/mpichman-chp4.htm</uri>. </p> @@ -973,44 +973,44 @@ <body> <p> -The original document is published at the <uri -link="http://www.adelielinux.com">Adelie Linux R&D Centre</uri> web site, -and is reproduced here with the permission of the authors and <uri -link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&D +The original document is published at the <uri +link="http://www.adelielinux.com">Adelie Linux R&D Centre</uri> web site, +and is reproduced here with the permission of the authors and <uri +link="http://www.cyberlogic.ca">Cyberlogic</uri>'s Adelie Linux R&D Centre. </p> <ul> <li><uri>http://www.gentoo.org</uri>, Gentoo Foundation, Inc.</li> <li> - <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>, + <uri link="http://www.adelielinux.com">http://www.adelielinux.com</uri>, Adelie Linux Research and Development Centre </li> <li> - <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>, + <uri link="http://nfs.sourceforge.net/">http://nfs.sourceforge.net</uri>, Linux NFS Project </li> <li> - <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>, + <uri link="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich/</uri>, Mathematics and Computer Science Division, Argonne National Laboratory </li> <li> <uri link="http://www.ntp.org/">http://ntp.org</uri> </li> <li> - <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>, + <uri link="http://www.eecis.udel.edu/~mills/">http://www.eecis.udel.edu/~mills/</uri>, David L. Mills, University of Delaware </li> <li> - <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>, + <uri link="http://www.ietf.org/html.charters/secsh-charter.html">http://www.ietf.org/html.charters/secsh-charter.html</uri>, Secure Shell Working Group, IETF, Internet Society </li> <li> - <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>, + <uri link="http://www.linuxsecurity.com/">http://www.linuxsecurity.com/</uri>, Guardian Digital </li> <li> - <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>, + <uri link="http://www.openpbs.org/">http://www.openpbs.org/</uri>, Altair Grid Technologies, LLC. </li> </ul> -- [email protected] mailing list
