Re: How to structure a GNU scientific project

2010-11-13 Thread Ralf Wildenhues
Hello Luke,

you somehow managed to send mail to the automake list without the list
address in Cc:, let's fix that.

* Luke wrote on Mon, Nov 08, 2010 at 07:36:13PM CET:
 I'm trying to organize the directory and file structure of my project
 and figure out how everything should be installed in a way that
 complies with the GCS and FHS.  Currently, my project provides several
 command line binary executables that do some numerical number
 crunching using GNU GSL.  The inputs to these binaries are a few human
 readable text files which specify some simulation parameters and
 settings.  The outputs of the executables are a couple of data files
 (time histories, level curves, etc.) and a couple of text files that
 show the simulation settings that were actually used and specify the
 format of the output data.  On top of this, I have some Python scripts
 which generate plots from the data files, and save them as pdfs.  The
 python scripts call os.system to specify the simulation inputs, run
 the simulation, and process output data to make some plots.  It also
 takes all inputs and outputs (input text files + output data + plots)
 and bundles them into a time-stamped tar.gz file so that multiple
 simulation runs don't overwrite each other, and to provide a way to go
 back and look at simulation results and know exactly what conditions
 created them.
 
 I have several sets of example input text files that allow a user to
 run the simulations with some default parameters.  I'm not clear on
 where these files should be placed in the source distribution and
 where they should be installed during 'make install'.  I would like a
 user to be able to easily find and open these text files so they can
 use them as templates for running simulations with different
 parameters.
 
 My questions are:
 1)  Where should I put these text simulation configuration files
 within my source distribution,

Whereever they suit you best.  There is no standard requirement for
this.  It is often helpful to have at least a similar directory
structure in the source tree than in the install tree (e.g., for the
subtree of all configuration files and directories).

 and where should they be installed to by default?

We put them below $(pkgdatadir), if they are system-independent:
  examplesdir = $(pkgdatadir)/examples
  examples_DATA = ...

Read-only configuration files for programs pertaining to a single system
can go in sysconfdir ($(prefix)/etc by default), but that is not
typically useful for simulation configuration files.

 2)  How should I make my application and/or user aware of where they
 are installed?  The way these files are used is by specifying a
 command line flag that directs the executable to parse a particular
 input file, so in order for this to be useful, the directory they are
 installed into must be known.

See 'info Autoconf Defining Directories' for how to pass configure
information to your code.  It is very useful to be able to override the
location with a command-line option, so that you can test programs in
your test suite before they are installed.

 3)  Should the python scripts go in site-packages, or would it make
 more sense for them to be installed alongside the binary executables?

Are they programs that are independently useful on their own, i.e., may
be called by the user, or just invoked from within your code?  If the
former, I'd make them bin_SCRIPTS (so ending up in $(bindir)), otherwise
I guess it might depend.  Auxiliary programs usually go in libexecdir,
python modules in the python tree.  The python documentation might have
more suggestions here.

 4)  I use the python scripts to save everything in a results/ folder
 (and I'm often working with my source directory, so this is
 src/results).  It seems like this folder ought to be in the users home
 directory somewhere, but maybe there are other places it would make
 sense to put this type of output data?

Output should IMVHO generally be relative to the current working
directory, and be configurable by either a command-line option, and/or
a setting in the configuration files.  The user should be able to run
multiple instances of your programs concurrently without having to worry
about them overwriting each others results.  In case you worry about
MPI, nowadays I don't know of any startup mechanisms any more that don't
allow you to specify the working directory of the running code.

 It seems like the text input files should go in a subdirectory of
 /usr/local/share, or maybe the whole project should go into a
 subdirectory of /opt.

Leave that to the user to decide.  Without configure switches, prefix
will default to /usr/local, thus datarootdir to /usr/local/share, thus
datadir to /usr/local/share, thus pkgdatadir to
/usr/local/share/$PACKAGE.  Each level can be overridden, so
'./configure --prefix=/opt' will install everything below /opt.
The unprivileged user should be able to install below her $HOME with
'./configure --prefix=$HOME/local' or so.


How to structure a GNU scientific project

2010-11-08 Thread Luke
I'm trying to organize the directory and file structure of my project
and figure out how everything should be installed in a way that
complies with the GCS and FHS.  Currently, my project provides several
command line binary executables that do some numerical number
crunching using GNU GSL.  The inputs to these binaries are a few human
readable text files which specify some simulation parameters and
settings.  The outputs of the executables are a couple of data files
(time histories, level curves, etc.) and a couple of text files that
show the simulation settings that were actually used and specify the
format of the output data.  On top of this, I have some Python scripts
which generate plots from the data files, and save them as pdfs.  The
python scripts call os.system to specify the simulation inputs, run
the simulation, and process output data to make some plots.  It also
takes all inputs and outputs (input text files + output data + plots)
and bundles them into a time-stamped tar.gz file so that multiple
simulation runs don't overwrite each other, and to provide a way to go
back and look at simulation results and know exactly what conditions
created them.

I have several sets of example input text files that allow a user to
run the simulations with some default parameters.  I'm not clear on
where these files should be placed in the source distribution and
where they should be installed during 'make install'.  I would like a
user to be able to easily find and open these text files so they can
use them as templates for running simulations with different
parameters.

My questions are:
1)  Where should I put these text simulation configuration files
within my source distribution, and where should they be installed to
by default?
2)  How should I make my application and/or user aware of where they
are installed?  The way these files are used is by specifying a
command line flag that directs the executable to parse a particular
input file, so in order for this to be useful, the directory they are
installed into must be known.
3)  Should the python scripts go in site-packages, or would it make
more sense for them to be installed alongside the binary executables?
4)  I use the python scripts to save everything in a results/ folder
(and I'm often working with my source directory, so this is
src/results).  It seems like this folder ought to be in the users home
directory somewhere, but maybe there are other places it would make
sense to put this type of output data?

It seems like the text input files should go in a subdirectory of
/usr/local/share, or maybe the whole project should go into a
subdirectory of /opt.  The way I'm using the tools is by having
everything within a folder in my home directory, but this is probably
not a good way to distribute.

The basic source layout is:
top-level -- has the standard GCS files and Autotools files
    src -- sources files which compile to executables, also has some
python scripts for postprocessing the data
    src/common -- convenience libraries that are used by executables
src/intialconditions -- text files for controlling initial
simulation conditions
src/parameters -- text files for specify different numerical
values for simulation parameters
src/integrationsettings -- text files for choosing different
numerical integration settings

There are anywhere from 5-20 text files in each of the
src/initialconditions, src/parameters, src/integrationsettings folders
that I want to distribute in some logical fashion.

Any thoughts?  Also, I'm using Autotools (Autoconf and Automake), so
if there are good ways to do this automagically with these tools, that
would be ideal.

Thanks,
~Luke

-- 
Dale L. Peterson
Sports Biomechanics Lab, UC Davis
http://dlpeterson.com/blog
Office:  +01 530-752-2163
Mobile: +01 805-698-6157