Hi,

I have just finished a package intended for use in our institution,
and would like to make it publicely available at the same time, so I
request an octave-forge account.

The package allows users to do parallel computation in a cluster of
machines in a fairly easy way; usage is rather high-level and
specialized (a user supplied function is evaluated with different
arguments on different machines). I append the svn-patch, and also the
README file although it is included in the patch, please take a look
at this README file.

The package is under extra/ since it is not much tested. On the other
hand, in the few test-runs everything worked, and according to the
Octave-help list there might be a demand on such functions. One could
consider to put the package under main/ . I am not sure.

Regards, Olaf

Attachment: beowulf.diff.gz
Description: Binary data

package: beowulf, 2009-03-23

This package is for remote computing on a cluster of machines. It is
more high-level and more specialized than the "parallel" package and
does not use the latter. It is made for clusters with machines that
may sometimes be unavailable, or get unavailable during a job,
typically if they have also Windows installed and users sometimes
restart them to temporarily use Windows. Also, temporary
inavailability of the central machine during the job is allowed for.

Prerequisites:

-- One central machine with a Unix-like OS which is running most of
   the time.

-- Some other machines which at least sometimes run a Unix-like OS and
   an SSH server.

-- Authentication to one of the machines gives acces to all others
   without need to give the password again; at least authentication to
   the central machine should give access to the others (e.g. use
   Kerberos).

-- Home directories are shared across all machines (e.g. NFS) and all
   Octave-related software used by the jobs is available on each
   machine.

The package runs an Octave function supplied by the user with
different sets of arguments. The function is of the form

function result = f (args[, args_id])

i.e. it accepts an argument "args", which might be a structure or
cell-array to accomodate a set of arguments, and possibly args_id,
which is the index of "args" within all its possible values (given in
a cell-array, see below). "results" of course may be a structure or
cell array too to accomodate more than one value.

For each set, the function is run at a different one of the currently
available machines. The user also supplies a one-dimensional
cell-array with different sets of arguments (i.e. values of "args") in
each entry. The cell-array must be stored in a file under the data
directory (given by "data_dir", default: ~/bw-data) and remain there
until computation is finished.

The current state is kept in a variable "state" saved to a file whose
name is sprintf("%s-%s.state", functionname, argumentsfilename) within
a state directory (given by "state_dir", default: ~/.bw-state). The
variables "computing_machines" and "central_machine" contain a
cell-array of addresses (strings) or a single address (string),
respectively.

The package reads the startup files fullfile(OCTAVE_HOME (),
"share/octave/site/m/startup/bwrc") and then "~/.bwrc", in which the
variables "data_dir", "state_dir", "computing_machines", and
"central_machine" can be set.


To start a job:

Prepare function for your job with the above properties, prepare
cell-array of argument variables for the function and save it in the
data directory. On any of the machines, run from Octave:

bw_start ("my_function", "argument_filename");

This starts the scheduler on the central machine in the background
(with nohup) and returns. You can log out then. If the job had been
running before, e.g if the scheduler had been killed for some reason,
it is restarted.


To inspect jobs:

bw_list ();


To retrieve results:

bw_retrieve (<arguments documented within the function>)


To restart all pending jobs:

bw_start () # without arguments

This may be necessary if the scheduler had been killed, or the central
machine was restarted, or maybe the Kerberos tickets got expired ...


To stop a job and/or remove the statefile:

bw_clear (<arguments documented within the function>)




Technical notes:

The scheduler forks child processes for each configured computing
machine and opens a permanent ssh connection with a permanent Octave
process running remotely. Different sets of arguments (single
variable) are sent over the connection and the respective results
(single variable) are sent back. If a connection gets unavailable, the
child process tries to restart it. The configured computing machines
are continuously scanned for available machines.

Advisory locking is used to avoid starting more than one scheduler for
a single combination of user_function/argument_file.

Variables are transfered by functions "prcv" and "psend", which use
Octaves code also used in "save" and "load".



Olaf Till <[email protected]>
------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev

Reply via email to