RE: guix and mirroring dataset

2021-05-26 Thread Cook, Malcolm
>> Does the guix project and members suggest best guix-ish practices for
>> managing on premise mirrors of large file-based data-sets such as
>> appear in genomics HPC evironments? 
>
>From my understanding, it is still “unsolved“ and there is no clear
>answer.
>
>Basically, the /gnu/store is not designed for managing large dataset and
>something is somehow missing. On the mailing list mailto:gwl-de...@gnu.org, we
>have already discussed that point although nothing came up, AFAIU.
>Recently, we discussed again, see the thread:
>
>

Nice - I missed that thread. It brings up good considerations:
 - "immutable” v “mutable" resources
 - IPFS as possible means of distribution

>
>Your input is welcome. :-)

I was expecting to find workflows that have been developed for mirroring 
(downloading) genomic resources from sites such as Ensembl/NCBI/UCSC, etc, and 
then creating on-prem derived resources (e.g. blast indexes).  

I currently tend to do this with Gnu Make and shell scripting.

I was not expecting to find guix efforts toward maintaining such pre-computed 
derived datasets in upstream repository of any sort, though that would be 
valuable to some.  Illumina for instance (used to?) keep selected genome 
indices for use with their software.  But that is not what I seek   and I 
think much of your remaining reply assumes it is.

>> Perhaps a guix-ish response to [Go Get Data \(GGD\) is a framework
>> that facilitates reproducible access to genomic
>> data](https://www.nature.com/articles/s41467-021-22381-z) 
>
>AFAIR, Ricardo pointed this GoGetData. Personally, I have not yet look
>at the details.

GoGetData does not seek to make upstream derived datasets available.  Rather 
their aim is to provide "as a fast, reproducible approach to installing 
standardized data recipes".  I assume GWL would be a good language to write 
such recipes, and that someone may already be doing so

GoGetData recipes are just bash scripts organized in a particular folder 
structure in a github repo that are expected to comport to a few conventions 
(e.g. variable names for genomes, species, etc) with a required yaml schema for 
their metadata.  The do not have any advanced workflow capabilities such as GWL 
might provide.

>> That would build on GWL?
>
>From my understanding, something is missing between ’packages’,
>’process’ and ’workflow’, for instance ’data’. And speaking about
>genomics, there is 2 kinds of large data:
>
>- fixed output (immutable?): think FASTA and FASTQ
>- computed output (mutable?): think BAM and indexes
>
>and it is not clear how to deal with them. And once that answered, how
>to share them (substitutes)? HTTP as all are doing, but we could also
>want IPFS or any other things which would avoid the mirroring/sync
>issues. 
>
>> Use cases would be, e.g. download/sync selected (versions of) genomes
>> from Ensembl/NCBI etc and index them for Blast, blat, bowtie{2}, bwa,
>> STAR, GMAP, HiSAT, IGV, BioConductor, etc... 
>>
>> I see much that addresses analysis workflows, such as
>> - [Reproducible genomics analysis pipelines with GNU 
>> Guix](https://www.biorxiv.org/content/10.1101/298653v2.full)
>> - [Scalable Workflows and Reproducible Data Analysis for 
>> Genomics](https://pubmed.ncbi.nlm.nih.gov/31278683/)
>> - [PiGx: reproducible genomics analysis pipelines with GNU 
>> Guix](https://academic.oup.com/gigascience/article/7/12/giy123/5114263)
>>
>> Am I missing similar efforts toward maintaining an up-to-date catalog
>> of the genomic resources that such workflows require? 
>
>For now, some are maintained as packages, for instance:
>
>$ guix search "^r-" hg19 | recsel -C -P name
>r-phastcons100way-ucsc-hg19
>r-bsgenome-hsapiens-ucsc-hg19-masked
>r-txdb-hsapiens-ucsc-hg19-knowngene
>r-bsgenome-hsapiens-ucsc-hg19
>r-snplocs-hsapiens-dbsnp144-grch37
>r-illuminahumanmethylation450kanno-ilmn12-hg19
>r-fdb-infiniummethylation-hg19
>r-copyhelper

Yes, thanks, I see that guix has versions of BioConductor data packages.  These 
are interesting use case.

>
>which are relative small, for another instance:
>
>--8<---cut here---start->8---
>r-txdb-hsapiens-ucsc-hg38-knowngene total: 91.8 MiB
>r-bsgenome-hsapiens-ucsc-hg38 total: 765.2 MiB
>r-copyhelper total: 42.9 MiB
>--8<---cut here---end--->8---
>
>
>Hope that helps,
>simon

Thanks Simon, I'm pleased to have your thoughts and pointers on this topic...

~Malcolm


Re: guix and mirroring dataset

2021-05-26 Thread zimoun
Hi,

> Does the guix project and members suggest best guix-ish practices for
> managing on premise mirrors of large file-based data-sets such as
> appear in genomics HPC evironments? 

>From my understanding, it is still “unsolved“ and there is no clear
answer.

Basically, the /gnu/store is not designed for managing large dataset and
something is somehow missing.  On the mailing list gwl-de...@gnu.org, we
have already discussed that point although nothing came up, AFAIU.
Recently, we discussed again, see the thread:



Your input is welcome. :-)

> Perhaps a guix-ish response to [Go Get Data \(GGD\) is a framework
> that facilitates reproducible access to genomic
> data](https://www.nature.com/articles/s41467-021-22381-z) 

AFAIR, Ricardo pointed this GoGetData.  Personally, I have not yet look
at the details.

> That would build on GWL?

>From my understanding, something is missing between ’packages’,
’process’ and ’workflow’, for instance ’data’.  And speaking about
genomics, there is 2 kinds of large data:

 - fixed output (immutable?): think FASTA and FASTQ
 - computed output (mutable?): think BAM and indexes

and it is not clear how to deal with them.  And once that answered, how
to share them (substitutes)? HTTP as all are doing, but we could also
want IPFS or any other things which would avoid the mirroring/sync
issues. 

> Use cases would be, e.g. download/sync selected (versions of) genomes
> from Ensembl/NCBI etc and index them for Blast, blat, bowtie{2}, bwa,
> STAR, GMAP, HiSAT, IGV, BioConductor, etc... 
>
> I see much that addresses analysis workflows, such as
>  -  [Reproducible genomics analysis pipelines with GNU 
> Guix](https://www.biorxiv.org/content/10.1101/298653v2.full)
>  - [Scalable Workflows and Reproducible Data Analysis for 
> Genomics](https://pubmed.ncbi.nlm.nih.gov/31278683/)
>  - [PiGx: reproducible genomics analysis pipelines with GNU 
> Guix](https://academic.oup.com/gigascience/article/7/12/giy123/5114263)
>
> Am I missing similar efforts toward maintaining an up-to-date catalog
> of the genomic resources that such workflows require? 

For now, some are maintained as packages, for instance:

  $ guix search "^r-" hg19 | recsel -C -P name
  r-phastcons100way-ucsc-hg19
  r-bsgenome-hsapiens-ucsc-hg19-masked
  r-txdb-hsapiens-ucsc-hg19-knowngene
  r-bsgenome-hsapiens-ucsc-hg19
  r-snplocs-hsapiens-dbsnp144-grch37
  r-illuminahumanmethylation450kanno-ilmn12-hg19
  r-fdb-infiniummethylation-hg19
  r-copyhelper

which are relative small, for another instance:

--8<---cut here---start->8---
r-txdb-hsapiens-ucsc-hg38-knowngene total: 91.8 MiB
r-bsgenome-hsapiens-ucsc-hg38 total: 765.2 MiB
r-copyhelper total: 42.9 MiB
--8<---cut here---end--->8---


Hope that helps,
simon



Re: git-fetch for emacs-auctex?

2021-05-26 Thread Paul Garlick
Hi Nicolas,

On Tue, 2021-05-25 at 22:10 +0200, Nicolas Goaziou wrote:
> Hello,
> 
> There are no tags in the ELPA repository, but new releases are
> triggered by a bump of "Version:" keyword. So, it is technically
> possible to map a version to a commit hash by looking for such
> changes. This has been suggested already in bug#46489.
> 

Thank you for pointing to the bug report 
https://issues.guix.gnu.org/issue/46849

It is the same issue as found in Nix.  This does not seem to be fixed
yet.  

I imagine there is a choice between an auctex-only fix and a general
ELPA fix.

Best regards,

Paul.




Re: git-fetch for emacs-auctex?

2021-05-26 Thread Paul Garlick
Hi Leo,

On Tue, 2021-05-25 at 22:00 +0200, Leo Prikler wrote:
> 
> What it this auctex and how does it differ from the one packaged in
> ELPA?
> 
This is the repository for the AUCTeX project.  The home page is 
https://www.gnu.org/software/auctex/

The ELPA package is derived from the upstream version.  It has its own
version numbering scheme.  For example, the current version on ELPA is
13.0.11.  The current upstream version is 12.3.

Every so often the upstream development branch is merged into the ELPA
branch and a new ELPA version is released.

One option for the Guix package would be to bypass ELPA and use the
upstream repository directly.  However, the version string would change
from "13.0.11" to something like "12.3-commit", where 'commit'
identifies the git reference of the upstream branch.

This would solve the reproducibility problem.  However, a one-off
update resulting in a smaller version number might cause confusion.

Are there views on this point?

Best regards,

Paul.






Re: Rust freedom issue claim

2021-05-26 Thread Pjotr Prins
On Wed, May 26, 2021 at 04:32:03PM +0200, Ludovic Courtès wrote:
> That’s a somewhat different topic.  FWIW, I’m both excited at the idea
> of having a memory-safe replacement for C gaining momentum, and
> frightened by the prospects of Rust being this replacement, for many
> reasons including: Rust does not have a good bootstrapping story, as we
> know all too well, Cargo encourages sloppy package distribution à la
> npm, Rust in the kernel would give a false sense of safety (it’s still
> that big monolithic blob!), and the Rust community is very much
> anti-copyleft.

Having adopted Rust for some of our bioinformatics work, I can fully
agree. It is actually hard to use Rust without Cargo and it is an
implosion npm-style waiting to happen if the most trivial program
already imports 100+ external packages - some of doubtful quality.

Another thing I have against Rust is its syntax - but that is
(arguably) taste. I can't believe references are written with an
ampersand - and they are so common it is in your face all the time.
That is just noise. And sometimes the borrow checker really gets in
the way (and I pine for GC). We are sticking with Rust though because
the compiler works hard and is a sucker for detail, so it helps both
less and more experienced programmers to avoid C/C++ traps. Also Rust
has no OOP that people can use - I am very happy about that. In short
it is a fairly pragmatic FP language with some nice compile time
features. I don't love it but it is an OK compromise.

For kernels I completely agree with you. Memory safety is a red
herring because we face much deeper problems. Open hardware and
message passing is the way forward.

Oh, did you know Rust expands all sources into one 'blob' for
compilation? At the crate level. It led to the meme: "The Rust
programming language compiles fast software slowly."

I have not hit real issues yet with compilation speed, but it feels
like we regressed to huge C++ template expansion...

> Guix, related projects such as Mes, Gash, and the Shepherd, together
> with the Hurd, offer a very different and (to me) more appealing vision
> for a user-empowering, safer, more robust, and yet POSIX-compliant OS.

Good architecture is far more important than a borrow checker.

Pj.



Re: Rust freedom issue claim

2021-05-26 Thread Ludovic Courtès
Hi,

Bone Baboon  skribis:

> This is an article from Hyperbola about the Rust trademark. It claims
> that Rust has a freedom issue.
> 

(Side note: “freedom issue” is not a helpful term as it could mean all
sorts of things.)

The trademark discussion refers to
, which
dates back to 2018.

In recent years, Mozilla’s trademark policy changed, to the point that
distributions can use the name “Firefox” for packages they provide:

  https://lwn.net/Articles/676799/

Before triggering an alarm, I would check what major distros, and Debian
in particular, are doing about Rust; I have not heard of any concerns so
far.  If the Rust trademark turns out to be a concern, distros should
try hard, collectively, to resolve it through dialog with Rust
Foundation people.

> If Rust does have a freedom issue then there is potential that it could
> have an impact on Linux-libre.  Recently there was a RFC for adding
> support for Rust to the Linux kernel
> .  Linus Torvalds's response is
> here .

That’s a somewhat different topic.  FWIW, I’m both excited at the idea
of having a memory-safe replacement for C gaining momentum, and
frightened by the prospects of Rust being this replacement, for many
reasons including: Rust does not have a good bootstrapping story, as we
know all too well, Cargo encourages sloppy package distribution à la
npm, Rust in the kernel would give a false sense of safety (it’s still
that big monolithic blob!), and the Rust community is very much
anti-copyleft.

Guix, related projects such as Mes, Gash, and the Shepherd, together
with the Hurd, offer a very different and (to me) more appealing vision
for a user-empowering, safer, more robust, and yet POSIX-compliant OS.

Ludo’.



Re: bug#47615: [PATCH 0/9] Add 32-bit powerpc support

2021-05-26 Thread Ludovic Courtès
Hi,

Efraim Flashner  skribis:

> On Tue, May 11, 2021 at 10:24:03PM +0200, Ludovic Courtès wrote:

[...]

>> Maybe it’s more readable to keep it as a bullet list, like:
>> 
>>   @item mips64el-linux (@emph{unsupported})
>>   …
>> 
>>   @item powerpc-linux (@emph{unsupported})
>>   …
>> 
>> with a sentence explaining what “unsupported” means.
>
> That was kind-of my idea with grouping them together

Yes, but people who just skim through that page may just see that
‘powerpc-linux’ is listed, without noticing that there’s a sentence ten
line above that says “The following arches are not supported”.  Having
“unsupported” next to it should avoid that (yes, I’ve learned to be
super cautious.  :-)).

>> IMO guix.m4 should either require --with-courage or emit a prominent
>> warning for these.

[...]

> The problem then becomes whoever tinkers with it will have to keep
> either a custom guix.m4 for their guix package or a custom guix package
> with "--with-courage" as a configure-flag.

True.  In that case, let it go without ‘--with-courage’ but emit a
warning with AC_MSG_WARN that points to the relevant section of the
manual.

Deal?

Thanks!

Ludo’.



Re: [PATCH RFC 0/4] Getting rid of input labels?

2021-05-26 Thread Ludovic Courtès
Hello,

Nicolas Goaziou  skribis:

> Ludovic Courtès  writes:

[...]

>>   • Packages such as ‘tzdata’ use labels to refer to non-package
>> inputs.  These cannot be converted to the automatic labeling
>> style, or not without extra changes.
>
> Would it be possible to write something like
>
>   (inputs (let ((tzcode (origin ...)))
> (list ... tzcode ...)))
>
> ?

Yes, but the problem is that the automatically-assigned label for
 records is “_” (a placeholder), because origins have no name,
unlike packages.

Thus, this phase:

(replace 'unpack
   (lambda* (#:key source inputs #:allow-other-keys)
 (invoke "tar" "xvf" source)
 (invoke "tar" "xvf" (assoc-ref inputs "tzcode"

… needs to be written differently, because there’s no “tzcode” label.

One option on ‘core-updates’ is to use gexps:

#~(modify-phases %standard-phases
;; …
(replace 'unpack
   (lambda* (#:key source inputs #:allow-other-keys)
 (invoke "tar" "xvf" source)
 (invoke "tar" "xvf" #$tzcode

However, this style breaks common uses of ‘inherit’ and uses of the
inputs field: ‘tzcode’ here need not even be listed in ‘inputs’, and
consequently, you cannot easily inherit from ‘tzdata’ and give it a
different ‘tzcode’.

We need to find and encourage appropriate idioms for corner cases like
this.  One option is the status quo: keep using labels in those rare
cases.

A crazier option would be to interpret input lists, when possible, as
both input lists and formal parameter lists for ‘arguments’.  Assume a
package with:

  (inputs (list foo bar baz))

The ‘arguments’ field, which is currently a thunk, would magically be
turned into:

  (lambda (foo bar baz)
…)

That way, arguments could refer to #$foo etc., and that’d just do the
right thing when inputs are overridden.

This would be reaching a level of magic that may not be wise, and it may
be hard to implement.

> Could the new syntax accept both variables and specifications, e.g.,
>
>(list "glib:bin" foo "bar@2.3")
>
> ?

No!  I mean, yes it could, but no, I don’t think that’s a good idea.
:-)

In terms of API, I prefer clarity; in this case, I think inputs should
be a list of packages or other “lowerable” objects, rather than a list
of “anything” that could be magically interpreted at run time.

More importantly, I think package specs are a UI feature, and that
packages should be separate from UI concerns.  Specs are implemented in
(gnu packages), not in (guix …) for a reason: it’s a feature that makes
sense in the distro, not in the core Guix machinery where there’s no
“package set” to traverse.

I hope that makes sense!

Ludo’.



Re: [PATCH RFC 0/4] Getting rid of input labels?

2021-05-26 Thread Ludovic Courtès
Hi Maxime,

Maxime Devos  skribis:

> Ludovic Courtès schreef op do 20-05-2021 om 16:58 [+0200]:
>> Hello Guix!
>> 
>> Here’s a proposal for a soft revolution: getting rid of input labels
>> in package definitions.  Instead of writing: [...]
>>
>> one can write:
>> 
>> (native-inputs (list autoconf automake pkg-config guile-3.0))
>> [...]
>
> This concept LGTM (but I haven't looked closely at the patches), but
> as noted on #guix, some issues with eliminating labels completely:
>
> A package definition of P may require both Q@1.0 and Q@2.0 as inputs,
> in which case a ‘label collision’ would be created if we generate
> labels package-name. More specifically, I'm thinking of packaging
> go-ipfs-migrations (or what's its name ...). It would be a good idea
> to add an (additional?) test to actually try to migrate from
> go-ipfs@first-version to go-ipfs@another-version.

Keep in mind that labels exist to make it easier to refer to a specific
input from the build side—in a phase, configure flag, etc.

In many cases, you don’t need the ability to refer to a specific input;
you just need all the inputs to contribute to search path environment
variables, and that’s enough.  A “label collision” does not matter at
all in this case.

In some cases, you do need to refer to a specific input, as in:

  #:configure-flags (list (string-append "--with-gmp-prefix="
 (assoc-ref %build-inputs "gmp")))

In this case, there are now two options:

  1. Arrange so that label is unique among your inputs, as is already
 the case.

  2. Use a gexp instead (possible on ‘core-updates’) like so:

   #:configure-flags #~(list (string-append "--with-gmp-prefix=" #$gmp))

 or, to allow for inheritance:

   #:configure-flags #~(list (string-append "--with-gmp-prefix="
#$@(assoc-ref
(package-inputs 
this-package)
"gmp")))

 The second variant is ugly, but we could provide helpers to make it
 prettier.

Do you think there are unaddressed issues with go-ipfs-migrations?

Thanks for your feedback!

Ludo’.



Re: [PATCH RFC 0/4] Getting rid of input labels?

2021-05-26 Thread Ludovic Courtès
Hi Vincent,

Vincent Legoll  skribis:

> What about
>
>> (native-inputs
>>  `(,autoconf
>>("truc" ,muche)
>>"pkg-config"
>> ))
>
> i.e. allowing package objects, tuples and names, and it would DTRT ?
>
> Wouldn't something like that be possible ?

It would be possible, but I’d rather not allow for mixing styles.  Not
allowing mixing style means it’s easier to check for correctness, and
the “auto labeling” code has fewer checks to make.

Ludo’.



Re: What’s next?

2021-05-26 Thread Ludovic Courtès
Hi,

Efraim Flashner  skribis:

> On Mon, May 17, 2021 at 10:13:10PM +0200, Ludovic Courtès wrote:
>> Hi,
>> 
>> Efraim Flashner  skribis:
>> 
>> > package-transformations applied to the operating-system field of the
>> > os-config.
>> 
>> Ah, that’s a good one, but possibly tricky!  What would it operate on?
>> Any package?  Only those showing up in the system profile?
>> 
>> The former is not really possible; the latter is.
>> 
>> Ludo’.
>> 
>
> The idea is everything in the system declaration. So the global
> packages, the guix-daemon from the guix-daemon service,

Yeah, that’s not easily possible, in the sense that there’s no place to
“hook” to perform such transforms.  The reason is that references to
packages may be buried down anywhere in gexps or files produced by
services, an obvious example being:

  #~(make-forkexec-constructor
  (list #$(file-append some-package "/bin/xyz")
…))

The reference to ‘some-package’ here is deep down.

An option would be to have a way to provide ‘lower-object’ a customize
way to lower all the  records that it sees, for instance.  But
then, how does that interact with caches, etc.  Tricky!

Ludo’.