Reproducibility

2010-04-30 Thread Teemu Ikonen
On Fri, Apr 30, 2010 at 2:08 AM, Michael Hanke michael.ha...@gmail.com wrote:
 Debian: The ultimate platform for neuroimaging research
[...]
 However, it is hard to blame the respective developers, because the
 sheer number of existing combinations of operating systems, hardware,
 and library versions makes it almost impossible to verify that a
 particular software is working as intended.  Restricting the
 ``supported'' runtime environment is one approach of making
 verification efforts feasible.

Dear list,

This nice abstract inspired me to think about reproducibility of
program runs. If one runs e.g. Debian unstable the OS code which can
potentially affect the results of calculations can change almost
daily. Reproducing results later can be close to impossible unless
versions of all the related libraries etc. are written down somewhere.

Does anyone here have good ideas on how to ensure reproducibility in
the long term? The only thing that comes to my mind is to run all
important calculations in a virtual machine image which is then signed
and stored in case the results need verification. But, maybe there are
other options?


Best,

Teemu


--
To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/m2s97fdf2d71004300101z977bdb3q5a2e3eba882c9...@mail.gmail.com



Re: Reproducibility

2010-04-30 Thread Michael Hanke
Hi,

On Fri, Apr 30, 2010 at 10:01:23AM +0200, Teemu Ikonen wrote:
 On Fri, Apr 30, 2010 at 2:08 AM, Michael Hanke michael.ha...@gmail.com 
 wrote:
  Debian: The ultimate platform for neuroimaging research
 [...]
  However, it is hard to blame the respective developers, because the
  sheer number of existing combinations of operating systems, hardware,
  and library versions makes it almost impossible to verify that a
  particular software is working as intended.  Restricting the
  ``supported'' runtime environment is one approach of making
  verification efforts feasible.
 
 Dear list,
 
 This nice abstract inspired me to think about reproducibility of
 program runs. If one runs e.g. Debian unstable the OS code which can
 potentially affect the results of calculations can change almost
 daily. Reproducing results later can be close to impossible unless
 versions of all the related libraries etc. are written down somewhere.

This is not just a potential problem -- we have seen it happen already.
Part of the problem is that in Debian we prefer dynamic linking to
up-to-date shared libs from separate packages -- instead of statically
linking to ancient versions with known behavior (for good reasons of
course).

 Does anyone here have good ideas on how to ensure reproducibility in
 the long term? The only thing that comes to my mind is to run all
 important calculations in a virtual machine image which is then signed
 and stored in case the results need verification. But, maybe there are
 other options?

IMHO better than relying on a snapshot of OS and a particular software
state to get constant results, projects should have comprehensive
regression tests that ensure proper behavior. The problem is, however,
that we cannot run then during package build time, since they tend to
require large datasets and run for many hours. Therefore users need to
do that, but nobody does it.


Michael


-- 
GPG key:  1024D/3144BE0F Michael Hanke
http://mih.voxindeserto.de


-- 
To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100430110721.ga26...@meiner



Re: Reproducibility

2010-04-30 Thread Andreas Tille
On Fri, Apr 30, 2010 at 07:07:21AM -0400, Michael Hanke wrote:
  This nice abstract inspired me to think about reproducibility of
  program runs. If one runs e.g. Debian unstable the OS code which can
  potentially affect the results of calculations can change almost
  daily. Reproducing results later can be close to impossible unless
  versions of all the related libraries etc. are written down somewhere.
 
 This is not just a potential problem -- we have seen it happen already.
 Part of the problem is that in Debian we prefer dynamic linking to
 up-to-date shared libs from separate packages -- instead of statically
 linking to ancient versions with known behavior (for good reasons of
 course).

I can confirm that this is actually the reason why at Sanger Institute
(even if there are three DDs working) plain Debian (and specifically the
Debian Med packages) is not used.  The requirement of the scientists is
to stick to a very specific version of the packages (not necessary those
which are part of a stable Debian release) and some labs use different
versions than other labs.
 
 IMHO better than relying on a snapshot of OS and a particular software
 state to get constant results, projects should have comprehensive
 regression tests that ensure proper behavior.

In theory this is probably right but in practice it needs extra manpower
which I doubt will be spend on problems like this.

 The problem is, however,
 that we cannot run then during package build time, since they tend to
 require large datasets and run for many hours. Therefore users need to
 do that, but nobody does it.

Yes, that's the problem.

Kind regards

Andreas. 

-- 
http://fam-tille.de


-- 
To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100430121829.ga8...@an3as.eu



Re: Reproducibility

2010-04-30 Thread Andreas Tille
On Fri, Apr 30, 2010 at 09:30:16AM -0300, David Bremner wrote:
  Yes, that's the problem.
 
 For stable releases though, we have the time, and we can (I suspect) get
 the compute cycles to run heavy regression tests. Would that be a
 worthwhile project? 

Well, it is not me who raised this problem and so I do not feel realy
able to give a definite answer.  But as I understood the people at
Sanger scientist do not really care about stable Debian.  They care
about a really specific version of a specific software.  Perhaps the
version in stable is to old - or it might even be to new (if they want
to reproduce old results).  I really dobt that these people who are
used to stick to such a version will care about Debians regression
tests if they have the chance to simply install their own version.

So for the application at Sanger I mentioned it is probably wasted
energy / manpower.  For other cases it might make sense, yes.

Kind regards

Andreas.

-- 
http://fam-tille.de


-- 
To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100430132342.ga12...@an3as.eu



Re: Reproducibility

2010-04-30 Thread Brett Viren
Teemu Ikonen tpiko...@gmail.com writes:

 Does anyone here have good ideas on how to ensure reproducibility in
 the long term? 

Regression testing, as mentioned, or running some fixed analysis and
statistically comparing the results to past runs.

We worry about reproducibility in my field of particle physics.  We run
on many different Linux and Mac platforms and strive for statistical
consistency (see below) not identical consistency.  I don't recall there
ever being an issue with different versions of, say, Debian system
libraries.  Any inconsistencies we have found have been due to version
shear in different copies of our own codes.

[Aside: I have seen gross differences between Debian and RH-derived
platforms.  In a past experiment I was the only collaborator working on
Debian and almost everyone else was using Scientific Linux (RHEL
derivative).  I kept getting bit by our code crashing on me.  It seems,
for some reason, my compilations tended to put garbage in uninitialized
pointers where on SL they tended to get NULL.  So, I was the lucky one
to find and fix a lot of programming mistakes.  This could have just
been a fluke, I have no explanation for it.]

 The only thing that comes to my mind is to run all
 important calculations in a virtual machine image which is then signed
 and stored in case the results need verification. But, maybe there are
 other options?

We have found that running the exact same code and same Debian OS on
differing CPUs will lead to differing results.  They differ because IEEE
FP standard isn't implemented exactly the same on all CPUs.  The
results will differ in only the least significant digits.  But, if you
use simulations that consume random numbers and compare them against FP
values this can lead to more gross divergences.  However, with a large
enough sample the results are all statistically consistent.

I don't know how that translates when using virtual machines on
different host CPUs, but if you care about bit-for-bit identically, this
FP standard may percolate up through the VM and ruin that.  Anyways,
in the end, all CPUs give the wrong results since FP calculations are
not infinitely precise, so striving for bit-for-bit consistency is kind
of a pipe dream.


-Brett.



smime.p7s
Description: S/MIME cryptographic signature


Re: Reproducibility

2010-04-30 Thread Adam C Powell IV
On Fri, 2010-04-30 at 14:18 +0200, Andreas Tille wrote:
 I can confirm that this is actually the reason why at Sanger Institute
 (even if there are three DDs working) plain Debian (and specifically the
 Debian Med packages) is not used.

FYI, I uploaded a new version of the Med packages on Monday which I
believe passes all of the tests (at least André Espaze got them all to
pass when he tried the package a while ago).

-Adam
-- 
GPG fingerprint: D54D 1AEE B11C CE9B A02B  C5DD 526F 01E8 564E E4B6

Engineering consulting with open source tools
http://www.opennovation.com/


signature.asc
Description: This is a digitally signed message part


Re: Reproducibility

2010-04-30 Thread Antonio Paiva
Those of you interested in reproducibility might be interested in
VisTrails. These is a start-up commercializing the software but most
of it is free and development is open source, available from
http://www.vistrails.org/index.php/Downloads. I remember that the
software keeps track of the libraries, OS, and CPU that the code is
using to get the results.

Best,
António Rafael C. Paiva
Post-doctoral fellow
SCI Institute, University of Utah
Salt Lake City, UT



On Fri, Apr 30, 2010 at 8:51 AM, Brett Viren b...@bnl.gov wrote:
 Teemu Ikonen tpiko...@gmail.com writes:

 Does anyone here have good ideas on how to ensure reproducibility in
 the long term?

 Regression testing, as mentioned, or running some fixed analysis and
 statistically comparing the results to past runs.

 We worry about reproducibility in my field of particle physics.  We run
 on many different Linux and Mac platforms and strive for statistical
 consistency (see below) not identical consistency.  I don't recall there
 ever being an issue with different versions of, say, Debian system
 libraries.  Any inconsistencies we have found have been due to version
 shear in different copies of our own codes.

 [Aside: I have seen gross differences between Debian and RH-derived
 platforms.  In a past experiment I was the only collaborator working on
 Debian and almost everyone else was using Scientific Linux (RHEL
 derivative).  I kept getting bit by our code crashing on me.  It seems,
 for some reason, my compilations tended to put garbage in uninitialized
 pointers where on SL they tended to get NULL.  So, I was the lucky one
 to find and fix a lot of programming mistakes.  This could have just
 been a fluke, I have no explanation for it.]

 The only thing that comes to my mind is to run all
 important calculations in a virtual machine image which is then signed
 and stored in case the results need verification. But, maybe there are
 other options?

 We have found that running the exact same code and same Debian OS on
 differing CPUs will lead to differing results.  They differ because IEEE
 FP standard isn't implemented exactly the same on all CPUs.  The
 results will differ in only the least significant digits.  But, if you
 use simulations that consume random numbers and compare them against FP
 values this can lead to more gross divergences.  However, with a large
 enough sample the results are all statistically consistent.

 I don't know how that translates when using virtual machines on
 different host CPUs, but if you care about bit-for-bit identically, this
 FP standard may percolate up through the VM and ruin that.  Anyways,
 in the end, all CPUs give the wrong results since FP calculations are
 not infinitely precise, so striving for bit-for-bit consistency is kind
 of a pipe dream.


 -Brett.




--
To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/o2v9f0a69bf1004300811rdc7039far1ba329a704526...@mail.gmail.com



Re: Reproducibility

2010-04-30 Thread Johan Grönqvist

2010-04-30 16:29, Michael Hanke skrev:


Usually we have some version in stable and some people will use it.


[...]


In
Debian we have the universal operating system that incorporates all
software and 'stable' is a snapshot of everything at the time of release
-- and this is not what scientists want.

That is why we have backports.org and neuro.debian.net that offer at
least the latest and greatest for 'stable'. But this is still not
enough.


This is why I like the approach of Gobolinux, at least in theory.

As I understand the basic idea of gobolinux, every packages follows a 
system like the debian alternatives system, where the alternatives are 
the different versions of that package. Upgrading therefore does not 
have to remove old versions, but can just install the new version and 
update the symlink, removing the old package can then be a separate 
procedure.


To me (IMHO) that feels like _the_ solution, when combined with the 
debian snapshot service. I imagine maintenance not to be a problem. I 
would be happy to have only the same level of support as we have now, 
but with the eternal _availability_ of all packages. Perhaps other users 
have other requirements.


Of course this also requires more form the alternatives-handling 
software in order to handle versioned dependencies when switching the 
alternatives system between different version, but I expect that to be 
doable using tools similar to those that exist now.


Regards

Johan


--
To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/hreska$ob...@dough.gmane.org



Re: Reproducibility

2010-04-30 Thread Yaroslav Halchenko

On Fri, 30 Apr 2010, Johan Grönqvist wrote:
 That is why we have backports.org and neuro.debian.net that offer at
 least the latest and greatest for 'stable'. But this is still not
 enough.
 To me (IMHO) that feels like _the_ solution, when combined with the
 debian snapshot service.
Exactly that -- snapshots! but not combined with anything: alternatives
are not a solution since it might be harder to control imho.

But consider snapshot.debian.org approach -- if the research system kept
up to a specific date -- you can deploy exactly the same environment
with consistent versioning later on with ease, and probably also simply
within a chroot using debootstrap within a matter of speed to the
mirror. The only thing to take care would be exactly the confusing
part -- alternatives (and possibly a custom system configuration if it
was of any relevance).

N.B. note for our neuro.debian.net -- we probably should setup such
 snapshots service ;-)

The alternatives (or modules in some other research
environments/systems) solution is indeed appealing for deploying
heterogeneous systems which aim to satisfy variety of
researchers/projects at once (for example - university-wide high
performance cluster) if those groups indeed require some custom software
no available natively as a part of OS.  But I think it just complicates
reproducibility -- complete chroot/virtual machine sounds more appealing
if reincarnation of the environment is necessary.

-- 
  .-.
=--   /v\  =
Keep in touch// \\ (yoh@|www.)onerussian.com
Yaroslav Halchenko  /(   )\   ICQ#: 60653192
   Linux User^^-^^[17]



-- 
To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100430175145.ge8...@onerussian.com



Re: Reproducibility

2010-04-30 Thread Yaroslav Halchenko

On Fri, 30 Apr 2010, Antonio Paiva wrote:
 http://www.vistrails.org/index.php/Downloads. I remember that the
 software keeps track of the libraries, OS, and CPU that the code is
 using to get the results.

 Best,
 António Rafael C. Paiva
 Post-doctoral fellow
 SCI Institute, University of Utah
 Salt Lake City, UT
ha -- also the land of SCIRun pipes and toys:
http://www.sci.utah.edu/cibc/software.html

if we are to share the links, here is imho also very relevant approach
for reproducibility assurance within research itself:

http://neuralensemble.org/trac/sumatra/wiki

Sumatra is a tool for managing and tracking projects based on numerical
simulation or analysis, with the aim of supporting reproducible research. It
can be thought of as an automated electronic lab notebook for
simulation/analysis projects. 


-- 
  .-.
=--   /v\  =
Keep in touch// \\ (yoh@|www.)onerussian.com
Yaroslav Halchenko  /(   )\   ICQ#: 60653192
   Linux User^^-^^[17]



-- 
To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100430180910.gf8...@onerussian.com