Hi, I have just finished a package intended for use in our institution, and would like to make it publicely available at the same time, so I request an octave-forge account.
The package allows users to do parallel computation in a cluster of machines in a fairly easy way; usage is rather high-level and specialized (a user supplied function is evaluated with different arguments on different machines). I append the svn-patch, and also the README file although it is included in the patch, please take a look at this README file. The package is under extra/ since it is not much tested. On the other hand, in the few test-runs everything worked, and according to the Octave-help list there might be a demand on such functions. One could consider to put the package under main/ . I am not sure. Regards, Olaf
beowulf.diff.gz
Description: Binary data
package: beowulf, 2009-03-23
This package is for remote computing on a cluster of machines. It is
more high-level and more specialized than the "parallel" package and
does not use the latter. It is made for clusters with machines that
may sometimes be unavailable, or get unavailable during a job,
typically if they have also Windows installed and users sometimes
restart them to temporarily use Windows. Also, temporary
inavailability of the central machine during the job is allowed for.
Prerequisites:
-- One central machine with a Unix-like OS which is running most of
the time.
-- Some other machines which at least sometimes run a Unix-like OS and
an SSH server.
-- Authentication to one of the machines gives acces to all others
without need to give the password again; at least authentication to
the central machine should give access to the others (e.g. use
Kerberos).
-- Home directories are shared across all machines (e.g. NFS) and all
Octave-related software used by the jobs is available on each
machine.
The package runs an Octave function supplied by the user with
different sets of arguments. The function is of the form
function result = f (args[, args_id])
i.e. it accepts an argument "args", which might be a structure or
cell-array to accomodate a set of arguments, and possibly args_id,
which is the index of "args" within all its possible values (given in
a cell-array, see below). "results" of course may be a structure or
cell array too to accomodate more than one value.
For each set, the function is run at a different one of the currently
available machines. The user also supplies a one-dimensional
cell-array with different sets of arguments (i.e. values of "args") in
each entry. The cell-array must be stored in a file under the data
directory (given by "data_dir", default: ~/bw-data) and remain there
until computation is finished.
The current state is kept in a variable "state" saved to a file whose
name is sprintf("%s-%s.state", functionname, argumentsfilename) within
a state directory (given by "state_dir", default: ~/.bw-state). The
variables "computing_machines" and "central_machine" contain a
cell-array of addresses (strings) or a single address (string),
respectively.
The package reads the startup files fullfile(OCTAVE_HOME (),
"share/octave/site/m/startup/bwrc") and then "~/.bwrc", in which the
variables "data_dir", "state_dir", "computing_machines", and
"central_machine" can be set.
To start a job:
Prepare function for your job with the above properties, prepare
cell-array of argument variables for the function and save it in the
data directory. On any of the machines, run from Octave:
bw_start ("my_function", "argument_filename");
This starts the scheduler on the central machine in the background
(with nohup) and returns. You can log out then. If the job had been
running before, e.g if the scheduler had been killed for some reason,
it is restarted.
To inspect jobs:
bw_list ();
To retrieve results:
bw_retrieve (<arguments documented within the function>)
To restart all pending jobs:
bw_start () # without arguments
This may be necessary if the scheduler had been killed, or the central
machine was restarted, or maybe the Kerberos tickets got expired ...
To stop a job and/or remove the statefile:
bw_clear (<arguments documented within the function>)
Technical notes:
The scheduler forks child processes for each configured computing
machine and opens a permanent ssh connection with a permanent Octave
process running remotely. Different sets of arguments (single
variable) are sent over the connection and the respective results
(single variable) are sent back. If a connection gets unavailable, the
child process tries to restart it. The configured computing machines
are continuously scanned for available machines.
Advisory locking is used to avoid starting more than one scheduler for
a single combination of user_function/argument_file.
Variables are transfered by functions "prcv" and "psend", which use
Octaves code also used in "save" and "load".
Olaf Till <[email protected]>
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________ Octave-dev mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/octave-dev
