Hello again Ludovic,
On 2018-02-09 18:13, ludovic.cour...@inria.fr wrote:
Hi!
Amirouche Boubekki <amirou...@hypermove.net> skribis:
tl;dr: Distribution of data and software seems similar.
Data is more and more important in software and reproducible
science. Data science ecosystem lakes resources sharing.
I think guix can help.
I think some of us especially Guix-HPC folks are convinced about the
usefulness of Guix as one of the tools in the reproducible science
toolchain (that was one of the themes of my FOSDEM talk). :-)
Now, whether Guix is the right tool to distribute data, I don’t know.
Distributing large amounts of data is a job in itself, and the store
isn’t designed for that. It could quickly become a bottleneck.
What does it mean technically that the store “isn't designed for that”?
That’s one of the reasons why the Guix Workflow Language (GWL)
does not store scientific data in the store itself.
Sorry, I did not follow the engineering discussion around GWL.
Looking up the web brings me [0]. That said the question I am
asking is not answered there. In particular there is no rationale
for that in the design paper.
[0] http://lists.gnu.org/archive/html/guix-devel/2016-10/msg01248.html
I think data should probably be stored and distributed out-of-band
using
appropriate storage mechanisms.
Then, in a follow up mail, you reply to Konrad:
Konrad Hinsen <konrad.hin...@fastmail.net> skribis:
[...]
It would be nice if big datasets could conceptually be handled in the
same way while being stored elsewhere - a bit like git-annex does for
git. And for parallel computing, we could have special build daemons.
Exactly. I think we need a git-annex/git-lfs-like tool for the store.
(It could also be useful for things like secrets, which we don’t want
to have in the store.)
-
" The most basic of all human needs is the need to understand and be
understood " Ralph Nichols