Re: Reproducibility
On Fri, 30 Apr 2010, Antonio Paiva wrote: > http://www.vistrails.org/index.php/Downloads. I remember that the > software keeps track of the libraries, OS, and CPU that the code is > using to get the results. > Best, > António Rafael C. Paiva > Post-doctoral fellow > SCI Institute, University of Utah > Salt Lake City, UT ha -- also the land of SCIRun pipes and toys: http://www.sci.utah.edu/cibc/software.html if we are to share the links, here is imho also very relevant approach for reproducibility assurance within research itself: http://neuralensemble.org/trac/sumatra/wiki Sumatra is a tool for managing and tracking projects based on numerical simulation or analysis, with the aim of supporting reproducible research. It can be thought of as an automated electronic lab notebook for simulation/analysis projects. -- .-. =-- /v\ = Keep in touch// \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User^^-^^[17] -- To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100430180910.gf8...@onerussian.com
Re: Reproducibility
On Fri, 30 Apr 2010, Johan Grönqvist wrote: > >That is why we have backports.org and neuro.debian.net that offer at > >least the latest and greatest for 'stable'. But this is still not > >enough. > To me (IMHO) that feels like _the_ solution, when combined with the > debian snapshot service. Exactly that -- snapshots! but not combined with anything: alternatives are not a solution since it might be harder to control imho. But consider snapshot.debian.org approach -- if the research system kept up to a specific date -- you can deploy exactly the same environment with consistent versioning later on with ease, and probably also simply within a chroot using debootstrap within a matter of speed to the mirror. The only thing to take care would be exactly the confusing part -- alternatives (and possibly a custom system configuration if it was of any relevance). N.B. note for our neuro.debian.net -- we probably should setup such snapshots service ;-) The "alternatives" (or "modules" in some other research environments/systems) solution is indeed appealing for deploying heterogeneous systems which aim to satisfy variety of researchers/projects at once (for example - university-wide high performance cluster) if those groups indeed require some custom software no available natively as a part of OS. But I think it just complicates reproducibility -- complete chroot/virtual machine sounds more appealing if reincarnation of the environment is necessary. -- .-. =-- /v\ = Keep in touch// \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User^^-^^[17] -- To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100430175145.ge8...@onerussian.com
Re: Reproducibility
2010-04-30 16:29, Michael Hanke skrev: Usually we have some version in stable and some people will use it. [...] In Debian we have the universal operating system that incorporates all software and 'stable' is a snapshot of everything at the time of release -- and this is not what scientists want. That is why we have backports.org and neuro.debian.net that offer at least the latest and greatest for 'stable'. But this is still not enough. This is why I like the approach of Gobolinux, at least in theory. As I understand the basic idea of gobolinux, every packages follows a system like the debian alternatives system, where the alternatives are the different versions of that package. Upgrading therefore does not have to remove old versions, but can just install the new version and update the symlink, removing the old package can then be a separate procedure. To me (IMHO) that feels like _the_ solution, when combined with the debian snapshot service. I imagine maintenance not to be a problem. I would be happy to have only the same level of support as we have now, but with the eternal _availability_ of all packages. Perhaps other users have other requirements. Of course this also requires more form the alternatives-handling software in order to handle versioned dependencies when switching the alternatives system between different version, but I expect that to be doable using tools similar to those that exist now. Regards Johan -- To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/hreska$ob...@dough.gmane.org
Re: Reproducibility
Those of you interested in reproducibility might be interested in VisTrails. These is a start-up commercializing the software but most of it is free and development is open source, available from http://www.vistrails.org/index.php/Downloads. I remember that the software keeps track of the libraries, OS, and CPU that the code is using to get the results. Best, António Rafael C. Paiva Post-doctoral fellow SCI Institute, University of Utah Salt Lake City, UT On Fri, Apr 30, 2010 at 8:51 AM, Brett Viren wrote: > Teemu Ikonen writes: > >> Does anyone here have good ideas on how to ensure reproducibility in >> the long term? > > Regression testing, as mentioned, or running some fixed analysis and > statistically comparing the results to past runs. > > We worry about reproducibility in my field of particle physics. We run > on many different Linux and Mac platforms and strive for statistical > consistency (see below) not identical consistency. I don't recall there > ever being an issue with different versions of, say, Debian system > libraries. Any inconsistencies we have found have been due to version > shear in different copies of our own codes. > > [Aside: I have seen gross differences between Debian and RH-derived > platforms. In a past experiment I was the only collaborator working on > Debian and almost everyone else was using Scientific Linux (RHEL > derivative). I kept getting bit by our code crashing on me. It seems, > for some reason, my compilations tended to put garbage in uninitialized > pointers where on SL they tended to get NULL. So, I was the lucky one > to find and fix a lot of programming mistakes. This could have just > been a fluke, I have no explanation for it.] > >> The only thing that comes to my mind is to run all >> important calculations in a virtual machine image which is then signed >> and stored in case the results need verification. But, maybe there are >> other options? > > We have found that running the exact same code and same Debian OS on > differing CPUs will lead to differing results. They differ because IEEE > FP "standard" isn't implemented exactly the same on all CPUs. The > results will differ in only the least significant digits. But, if you > use simulations that consume random numbers and compare them against FP > values this can lead to more gross divergences. However, with a large > enough sample the results are all statistically consistent. > > I don't know how that translates when using virtual machines on > different host CPUs, but if you care about bit-for-bit identically, this > FP "standard" may percolate up through the VM and ruin that. Anyways, > in the end, all CPUs give the "wrong" results since FP calculations are > not infinitely precise, so striving for bit-for-bit consistency is kind > of a pipe dream. > > > -Brett. > > -- To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/o2v9f0a69bf1004300811rdc7039far1ba329a704526...@mail.gmail.com
Re: Reproducibility
On Fri, 2010-04-30 at 14:18 +0200, Andreas Tille wrote: > I can confirm that this is actually the reason why at Sanger Institute > (even if there are three DDs working) plain Debian (and specifically the > Debian Med packages) is not used. FYI, I uploaded a new version of the Med packages on Monday which I believe passes all of the tests (at least André Espaze got them all to pass when he tried the package a while ago). -Adam -- GPG fingerprint: D54D 1AEE B11C CE9B A02B C5DD 526F 01E8 564E E4B6 Engineering consulting with open source tools http://www.opennovation.com/ signature.asc Description: This is a digitally signed message part
Re: Reproducibility
Teemu Ikonen writes: > Does anyone here have good ideas on how to ensure reproducibility in > the long term? Regression testing, as mentioned, or running some fixed analysis and statistically comparing the results to past runs. We worry about reproducibility in my field of particle physics. We run on many different Linux and Mac platforms and strive for statistical consistency (see below) not identical consistency. I don't recall there ever being an issue with different versions of, say, Debian system libraries. Any inconsistencies we have found have been due to version shear in different copies of our own codes. [Aside: I have seen gross differences between Debian and RH-derived platforms. In a past experiment I was the only collaborator working on Debian and almost everyone else was using Scientific Linux (RHEL derivative). I kept getting bit by our code crashing on me. It seems, for some reason, my compilations tended to put garbage in uninitialized pointers where on SL they tended to get NULL. So, I was the lucky one to find and fix a lot of programming mistakes. This could have just been a fluke, I have no explanation for it.] > The only thing that comes to my mind is to run all > important calculations in a virtual machine image which is then signed > and stored in case the results need verification. But, maybe there are > other options? We have found that running the exact same code and same Debian OS on differing CPUs will lead to differing results. They differ because IEEE FP "standard" isn't implemented exactly the same on all CPUs. The results will differ in only the least significant digits. But, if you use simulations that consume random numbers and compare them against FP values this can lead to more gross divergences. However, with a large enough sample the results are all statistically consistent. I don't know how that translates when using virtual machines on different host CPUs, but if you care about bit-for-bit identically, this FP "standard" may percolate up through the VM and ruin that. Anyways, in the end, all CPUs give the "wrong" results since FP calculations are not infinitely precise, so striving for bit-for-bit consistency is kind of a pipe dream. -Brett. smime.p7s Description: S/MIME cryptographic signature
Re: Reproducibility
On Fri, Apr 30, 2010 at 03:23:42PM +0200, Andreas Tille wrote: > On Fri, Apr 30, 2010 at 09:30:16AM -0300, David Bremner wrote: > > > Yes, that's the problem. > > > > For stable releases though, we have the time, and we can (I suspect) get > > the compute cycles to run heavy regression tests. Would that be a > > worthwhile project? > > Well, it is not me who raised this problem and so I do not feel realy > able to give a definite answer. But as I understood the people at > Sanger scientist do not really care about stable Debian. They care > about a really specific version of a specific software. Perhaps the > version in stable is to old - or it might even be to new (if they want > to reproduce old results). I really dobt that these people who are > used to stick to such a version will care about Debians regression > tests if they have the chance to simply install their own version. Right. Usually we have some version in stable and some people will use it. In general, however, people want the stable 'operating system' and _in addition_ a multitude of versions of their critical applications. In Debian we have the universal operating system that incorporates all software and 'stable' is a snapshot of everything at the time of release -- and this is not what scientists want. That is why we have backports.org and neuro.debian.net that offer at least the latest and greatest for 'stable'. But this is still not enough. Ideally, I would keep maintaining each an every package version for an indefinite period of time -- that should make everybody happy, but I'm clearly not going to do this unless my day gets additional 24 hours ;-) BUT: I believe people would be a lot more happy to keep upgrading to latest versions, IF there would be a standardized, upstream-supported method to perform some reasonable tests. This is a topic that Yarik stripped from his talk, but is still very interesting to talk about: We, as Debian, could to a lot to help upstream projects to deploy their software in a more sane way, by offering concrete guidelines and facilities to do what is necessary to ensure proper behavior. IMHO, sticking to old versions is a reality in the science community -- but it is a problem that should be solved and not supported. Michael -- GPG key: 1024D/3144BE0F Michael Hanke http://mih.voxindeserto.de -- To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100430142954.ga2...@meiner
Re: Reproducibility
On Fri, Apr 30, 2010 at 09:30:16AM -0300, David Bremner wrote: > > Yes, that's the problem. > > For stable releases though, we have the time, and we can (I suspect) get > the compute cycles to run heavy regression tests. Would that be a > worthwhile project? Well, it is not me who raised this problem and so I do not feel realy able to give a definite answer. But as I understood the people at Sanger scientist do not really care about stable Debian. They care about a really specific version of a specific software. Perhaps the version in stable is to old - or it might even be to new (if they want to reproduce old results). I really dobt that these people who are used to stick to such a version will care about Debians regression tests if they have the chance to simply install their own version. So for the application at Sanger I mentioned it is probably wasted energy / manpower. For other cases it might make sense, yes. Kind regards Andreas. -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100430132342.ga12...@an3as.eu
Re: Reproducibility
On Fri, Apr 30, 2010 at 07:07:21AM -0400, Michael Hanke wrote: > > This nice abstract inspired me to think about reproducibility of > > program runs. If one runs e.g. Debian unstable the OS code which can > > potentially affect the results of calculations can change almost > > daily. Reproducing results later can be close to impossible unless > > versions of all the related libraries etc. are written down somewhere. > > This is not just a potential problem -- we have seen it happen already. > Part of the problem is that in Debian we prefer dynamic linking to > up-to-date shared libs from separate packages -- instead of statically > linking to ancient versions with known behavior (for good reasons of > course). I can confirm that this is actually the reason why at Sanger Institute (even if there are three DDs working) plain Debian (and specifically the Debian Med packages) is not used. The requirement of the scientists is to stick to a very specific version of the packages (not necessary those which are part of a stable Debian release) and some labs use different versions than other labs. > IMHO better than relying on a snapshot of OS and a particular software > state to get constant results, projects should have comprehensive > regression tests that ensure proper behavior. In theory this is probably right but in practice it needs extra manpower which I doubt will be spend on problems like this. > The problem is, however, > that we cannot run then during package build time, since they tend to > require large datasets and run for many hours. Therefore users need to > do that, but nobody does it. Yes, that's the problem. Kind regards Andreas. -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100430121829.ga8...@an3as.eu
Re: Reproducibility
Hi, On Fri, Apr 30, 2010 at 10:01:23AM +0200, Teemu Ikonen wrote: > On Fri, Apr 30, 2010 at 2:08 AM, Michael Hanke > wrote: > > Debian: The ultimate platform for neuroimaging research > [...] > > However, it is hard to blame the respective developers, because the > > sheer number of existing combinations of operating systems, hardware, > > and library versions makes it almost impossible to verify that a > > particular software is working as intended. Restricting the > > ``supported'' runtime environment is one approach of making > > verification efforts feasible. > > Dear list, > > This nice abstract inspired me to think about reproducibility of > program runs. If one runs e.g. Debian unstable the OS code which can > potentially affect the results of calculations can change almost > daily. Reproducing results later can be close to impossible unless > versions of all the related libraries etc. are written down somewhere. This is not just a potential problem -- we have seen it happen already. Part of the problem is that in Debian we prefer dynamic linking to up-to-date shared libs from separate packages -- instead of statically linking to ancient versions with known behavior (for good reasons of course). > Does anyone here have good ideas on how to ensure reproducibility in > the long term? The only thing that comes to my mind is to run all > important calculations in a virtual machine image which is then signed > and stored in case the results need verification. But, maybe there are > other options? IMHO better than relying on a snapshot of OS and a particular software state to get constant results, projects should have comprehensive regression tests that ensure proper behavior. The problem is, however, that we cannot run then during package build time, since they tend to require large datasets and run for many hours. Therefore users need to do that, but nobody does it. Michael -- GPG key: 1024D/3144BE0F Michael Hanke http://mih.voxindeserto.de -- To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100430110721.ga26...@meiner
Reproducibility
On Fri, Apr 30, 2010 at 2:08 AM, Michael Hanke wrote: > Debian: The ultimate platform for neuroimaging research [...] > However, it is hard to blame the respective developers, because the > sheer number of existing combinations of operating systems, hardware, > and library versions makes it almost impossible to verify that a > particular software is working as intended. Restricting the > ``supported'' runtime environment is one approach of making > verification efforts feasible. Dear list, This nice abstract inspired me to think about reproducibility of program runs. If one runs e.g. Debian unstable the OS code which can potentially affect the results of calculations can change almost daily. Reproducing results later can be close to impossible unless versions of all the related libraries etc. are written down somewhere. Does anyone here have good ideas on how to ensure reproducibility in the long term? The only thing that comes to my mind is to run all important calculations in a virtual machine image which is then signed and stored in case the results need verification. But, maybe there are other options? Best, Teemu -- To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/m2s97fdf2d71004300101z977bdb3q5a2e3eba882c9...@mail.gmail.com