Hello, We are closing in on the workflows. What is kind of missing are the mostly invariant inputs like the genomes of pathogens and very much so the reference genomes of the human, mouse, rat, worm, fly, .... you name them.
Other than a few years ago, hard drives are now big enough to accommodate the one or other genome and derivative indexes. Just - I don't think we want to organize in our regular Debian infrastructure something as variant as public genome (yes, they are still regularly updated, very much so) and that is so very security-irrelevant (just some data). Also, different sites will vary a lot in where this data shall be organized and all those scripts should likely be executed/initiated as/by non-root. There are public sites for this from where this data can be downloaded. Any redundancy to these sites imho mostly hurts us. The other side is that to just get something up quickly and for reproducibility tests, our infrastructure is difficult to beat. Please kindly throw your ideas at me how you would like whole genomes to be presented by Debian to the average user and to professionals. Just reply to this thread and/or send me "+1"s a PM and I summarize this up in a document which I suggest we then talk about in a jitsi meeting. Best, Steffen

