Re: [PATCH] gnu: Add minced.

2016-08-17 Thread Ben Woodcroft



On 16/08/16 22:34, Marius Bakke wrote:

Ben Woodcroft  writes:


Hi Marius,

Excellent to see others interested in packaging microbial bioinformatics
tools.

You may want to look here before packaging other microbio tools:

https://github.com/MRC-CLIMB/guix-climb

I'm currently working on upstreaming most of these :)
Cool, and they generally look in quite good shape too. I'll be away next 
week, but I'd be happy to review them after that.



I tried this in a container and it seems that it calls out to a few
programs preventing it from working:

[env]# minced -h
/gnu/store/8asw2i54x53rrwdr8qw4j2rpbkc9fqzz-profile/bin/minced: line 7:
dirname: command not found
/gnu/store/8asw2i54x53rrwdr8qw4j2rpbkc9fqzz-profile/bin/minced: line 9:
dirname: command not found
/gnu/store/8asw2i54x53rrwdr8qw4j2rpbkc9fqzz-profile/bin/minced: line 11:
basename: command not found

So I think that (in order of preference), the source files themselves
should be patched with the absolute paths to these tools, the binary
should be wrapped, or coreutils should be propagated. For the first two
options, coreutils should be an input.

Good catch. Since the "minced" executable is just a wrapper script, I
opted to build my own wrapper to avoid the coreutils dependency.

Good idea.




Now some small comments on the patch itself.


+(arguments
+ `(#:test-target "test"
+   #:phases
+   (modify-phases %standard-phases
+ (delete 'configure)
+ (add-before 'check 'fix-test
+   (lambda _
+ ;; Fix test for latest version.
+ (substitute* "t/Aquifex_aeolicus_VF5.expected"
+   (("minced:0.1.6") "minced:0.2.0"

It might be more future-proof to use '(string-append "minced:"
,version)' instead of hard-coding 0.2.0.

I don't think this will be a problem in future releases. And it can also
cause problems, in case a user sets version to e.g. a commit hash.

OK.

Also, this phase (and the next two) should end in #t since the return
value of substitute* is undefined.



+ (add-before 'install 'qualify-java-path
+   (lambda* (#:key inputs #:allow-other-keys)
+ (substitute* "minced"
+   ;; Set full path to java binary in wrapper script.
+   (("^java") (string-append (assoc-ref inputs "jre")
+ "/bin/java")
+ (replace 'install
+   ;; No install target.
+   (lambda* (#:key outputs #:allow-other-keys)
+ (let* ((out (assoc-ref outputs "out"))
+(bin (string-append out "/bin")))
+   (for-each (lambda (file)
+   (install-file file bin))
+ (list "minced" "minced.jar"
+(native-inputs
+ `(("jdk", icedtea "jdk")))
+(inputs
+ `(("jre", icedtea)))

The commas should go after the space.

Oops, I should stop submitting patches late in the night!


+(home-page"https://github.com/ctSkennerton/minced;)
+(synopsis "Mining CRISPRs in Environmental Datasets")
+(description
+ "MinCED is a program to find Clustered Regularly Interspaced Short
+Palindromic Repeats (CRISPRs) in full genomes or environmental datasets such
+as metagenomes, in which sequence size can be anywhere from 100 to 800 bp.")

That description which you took from the README is a little dated at the
end. How about this?

"MinCED is a program to find Clustered Regularly Interspaced Short
Palindromic Repeats (CRISPRs) in both full genomes and shorter metagenomic 
sequences."

The rest LGTM. Thanks. Can you send an updated patch please?

Please find updated patch below. Thanks for the feedback!


Pushed as '318c0ae' after adding a space in the description and only 
using only using the "out" of icedtea in the inputs.


Thanks.
ben



Re: [PATCH] gnu: Add minced.

2016-08-16 Thread Marius Bakke

> Please find updated patch below. Thanks for the feedback!

Ugh. Missed a commit when squashing. Here is the final patch:

>From bb50f5c0d960625483e8db15a270a7f17d5d2d9c Mon Sep 17 00:00:00 2001
From: Marius Bakke <mba...@fastmail.com>
Date: Mon, 15 Aug 2016 16:06:37 +0100
Subject: [PATCH] gnu: Add minced.

* gnu/packages/bioinformatics.scm (minced): New variable.
---
 gnu/packages/bioinformatics.scm | 58 +
 1 file changed, 58 insertions(+)

diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index dd69e39..6a27e06 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -5,6 +5,7 @@
 ;;; Copyright © 2015 Andreas Enge <andr...@enge.fr>
 ;;; Copyright © 2016 Roel Janssen <r...@gnu.org>
 ;;; Copyright © 2016 Efraim Flashner <efr...@flashner.co.il>
+;;; Copyright © 2016 Marius Bakke <mba...@fastmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -40,6 +41,7 @@
   #:use-module (gnu packages autotools)
   #:use-module (gnu packages algebra)
   #:use-module (gnu packages base)
+  #:use-module (gnu packages bash)
   #:use-module (gnu packages bison)
   #:use-module (gnu packages boost)
   #:use-module (gnu packages compression)
@@ -3118,6 +3120,62 @@ probabilistic distances of genome abundance and tetranucleotide frequency.")
(license (license:non-copyleft "file://license.txt"
   "See license.txt in the distribution."
 
+(define-public minced
+  (package
+(name "minced")
+(version "0.2.0")
+(source (origin
+  (method url-fetch)
+  (uri (string-append
+"https://github.com/ctSkennerton/minced/archive/;
+version ".tar.gz"))
+  (file-name (string-append name "-" version ".tar.gz"))
+  (sha256
+   (base32
+"0wxmlsapxfpxfd3ps9636h7i2xy6la8i42mwh0j2lsky63h63jp1"
+(build-system gnu-build-system)
+(arguments
+ `(#:test-target "test"
+   #:phases
+   (modify-phases %standard-phases
+ (delete 'configure)
+ (add-before 'check 'fix-test
+   (lambda _
+ ;; Fix test for latest version.
+ (substitute* "t/Aquifex_aeolicus_VF5.expected"
+   (("minced:0.1.6") "minced:0.2.0"))
+ #t))
+ (replace 'install ; No install target.
+   (lambda* (#:key inputs outputs #:allow-other-keys)
+ (let* ((out (assoc-ref outputs "out"))
+(bin (string-append out "/bin"))
+(wrapper (string-append bin "/minced")))
+   ;; Minced comes with a wrapper script that tries to figure out where
+   ;; it is located before running the JAR. Since these paths are known
+   ;; to us, we build our own wrapper to avoid coreutils dependency.
+   (install-file "minced.jar" bin)
+   (with-output-to-file wrapper
+ (lambda _
+   (display
+(string-append
+ "#!" (assoc-ref inputs "bash") "/bin/sh\n\n"
+ (assoc-ref inputs "jre") "/bin/java -jar "
+ bin "/minced.jar \"$@\"\n"
+   (chmod wrapper #o555)))
+(native-inputs
+ `(("jdk" ,icedtea "jdk")))
+(inputs
+ `(("bash" ,bash)
+   ("jre" ,icedtea)))
+(home-page "https://github.com/ctSkennerton/minced;)
+(synopsis "Mining CRISPRs in Environmental Datasets")
+(description
+ "MinCED is a program to find Clustered Regularly Interspaced Short
+Palindromic Repeats (CRISPRs) in DNA sequences. It can be used for unassembled
+metagenomic reads, but is mainly designed for full genomes and assembled
+metagenomic sequence.")
+(license license:gpl3+)))
+
 (define-public miso
   (package
 (name "miso")
-- 
2.9.2



Re: [PATCH] gnu: Add minced.

2016-08-16 Thread Marius Bakke
Ben Woodcroft <b.woodcr...@uq.edu.au> writes:

> Hi Marius,
>
> Excellent to see others interested in packaging microbial bioinformatics 
> tools.

You may want to look here before packaging other microbio tools:

https://github.com/MRC-CLIMB/guix-climb

I'm currently working on upstreaming most of these :)

> I tried this in a container and it seems that it calls out to a few 
> programs preventing it from working:
>
> [env]# minced -h
> /gnu/store/8asw2i54x53rrwdr8qw4j2rpbkc9fqzz-profile/bin/minced: line 7: 
> dirname: command not found
> /gnu/store/8asw2i54x53rrwdr8qw4j2rpbkc9fqzz-profile/bin/minced: line 9: 
> dirname: command not found
> /gnu/store/8asw2i54x53rrwdr8qw4j2rpbkc9fqzz-profile/bin/minced: line 11: 
> basename: command not found
>
> So I think that (in order of preference), the source files themselves 
> should be patched with the absolute paths to these tools, the binary 
> should be wrapped, or coreutils should be propagated. For the first two 
> options, coreutils should be an input.

Good catch. Since the "minced" executable is just a wrapper script, I
opted to build my own wrapper to avoid the coreutils dependency.

> Now some small comments on the patch itself.
>
>> +(arguments
>> + `(#:test-target "test"
>> +   #:phases
>> +   (modify-phases %standard-phases
>> + (delete 'configure)
>> + (add-before 'check 'fix-test
>> +   (lambda _
>> + ;; Fix test for latest version.
>> + (substitute* "t/Aquifex_aeolicus_VF5.expected"
>> +   (("minced:0.1.6") "minced:0.2.0"
> It might be more future-proof to use '(string-append "minced:" 
> ,version)' instead of hard-coding 0.2.0.

I don't think this will be a problem in future releases. And it can also
cause problems, in case a user sets version to e.g. a commit hash.

> Also, this phase (and the next two) should end in #t since the return 
> value of substitute* is undefined.
>
>
>> + (add-before 'install 'qualify-java-path
>> +   (lambda* (#:key inputs #:allow-other-keys)
>> + (substitute* "minced"
>> +   ;; Set full path to java binary in wrapper script.
>> +   (("^java") (string-append (assoc-ref inputs "jre")
>> + "/bin/java")
>> + (replace 'install
>> +   ;; No install target.
>> +   (lambda* (#:key outputs #:allow-other-keys)
>> + (let* ((out (assoc-ref outputs "out"))
>> +(bin (string-append out "/bin")))
>> +   (for-each (lambda (file)
>> +   (install-file file bin))
>> + (list "minced" "minced.jar"
>> +(native-inputs
>> + `(("jdk", icedtea "jdk")))
>> +(inputs
>> + `(("jre", icedtea)))
> The commas should go after the space.

Oops, I should stop submitting patches late in the night!

>> +(home-page"https://github.com/ctSkennerton/minced;)
>> +(synopsis "Mining CRISPRs in Environmental Datasets")
>> +(description
>> + "MinCED is a program to find Clustered Regularly Interspaced Short
>> +Palindromic Repeats (CRISPRs) in full genomes or environmental datasets such
>> +as metagenomes, in which sequence size can be anywhere from 100 to 800 bp.")
> That description which you took from the README is a little dated at the 
> end. How about this?
>
> "MinCED is a program to find Clustered Regularly Interspaced Short
> Palindromic Repeats (CRISPRs) in both full genomes and shorter metagenomic 
> sequences."
>
> The rest LGTM. Thanks. Can you send an updated patch please?

Please find updated patch below. Thanks for the feedback!

>From 53602b0908956d122146baafdc1a541596df151d Mon Sep 17 00:00:00 2001
From: Marius Bakke <mba...@fastmail.com>
Date: Mon, 15 Aug 2016 16:06:37 +0100
Subject: [PATCH] gnu: Add minced.

* gnu/packages/bioinformatics.scm (minced): New variable.
---
 gnu/packages/bioinformatics.scm | 55 +
 1 file changed, 55 insertions(+)

diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index dd69e39..4869d56 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -5,6 +5,7 @@
 ;;; Copyright © 2015 Andreas Enge <andr...@enge.fr>
 ;;; Copyright © 2016 Roel Janssen <r...@gnu.org>
 ;;; Copyright © 2016 Efraim Flashner <efr...@flashner.co.il>
+;;; Copyright © 2016 Mari

Re: [PATCH] gnu: Add minced.

2016-08-16 Thread Ben Woodcroft

Hi again,


On 16/08/16 14:59, Ben Woodcroft wrote:

[..]

+(home-page"https://github.com/ctSkennerton/minced;)
+(synopsis "Mining CRISPRs in Environmental Datasets")
+(description
+ "MinCED is a program to find Clustered Regularly Interspaced Short
+Palindromic Repeats (CRISPRs) in full genomes or environmental datasets such
+as metagenomes, in which sequence size can be anywhere from 100 to 800 bp.")
That description which you took from the README is a little dated at 
the end. How about this?

"MinCED is a program to find Clustered Regularly Interspaced Short
Palindromic Repeats (CRISPRs) in both full genomes and shorter metagenomic 
sequences."
I happen to know Connor, the maintainer of this package, and he suggests 
that it is more aimed at assembled sequence data than reads (for which 
he suggests the crass program is better). So how about this instead?


"MinCED is a program to find Clustered Regularly Interspaced Short 
Palindromic Repeats (CRISPRs) in DNA sequences. It can be used for 
unassembled metagenomic reads, but is mainly designed for full genomes 
and assembled metagenomic sequence."


ben


Re: [PATCH] gnu: Add minced.

2016-08-15 Thread Ben Woodcroft

Hi Marius,

Excellent to see others interested in packaging microbial bioinformatics 
tools.



I tried this in a container and it seems that it calls out to a few 
programs preventing it from working:


[env]# minced -h
/gnu/store/8asw2i54x53rrwdr8qw4j2rpbkc9fqzz-profile/bin/minced: line 7: 
dirname: command not found
/gnu/store/8asw2i54x53rrwdr8qw4j2rpbkc9fqzz-profile/bin/minced: line 9: 
dirname: command not found
/gnu/store/8asw2i54x53rrwdr8qw4j2rpbkc9fqzz-profile/bin/minced: line 11: 
basename: command not found


So I think that (in order of preference), the source files themselves 
should be patched with the absolute paths to these tools, the binary 
should be wrapped, or coreutils should be propagated. For the first two 
options, coreutils should be an input.


Now some small comments on the patch itself.


+(arguments
+ `(#:test-target "test"
+   #:phases
+   (modify-phases %standard-phases
+ (delete 'configure)
+ (add-before 'check 'fix-test
+   (lambda _
+ ;; Fix test for latest version.
+ (substitute* "t/Aquifex_aeolicus_VF5.expected"
+   (("minced:0.1.6") "minced:0.2.0"
It might be more future-proof to use '(string-append "minced:" 
,version)' instead of hard-coding 0.2.0.


Also, this phase (and the next two) should end in #t since the return 
value of substitute* is undefined.




+ (add-before 'install 'qualify-java-path
+   (lambda* (#:key inputs #:allow-other-keys)
+ (substitute* "minced"
+   ;; Set full path to java binary in wrapper script.
+   (("^java") (string-append (assoc-ref inputs "jre")
+ "/bin/java")
+ (replace 'install
+   ;; No install target.
+   (lambda* (#:key outputs #:allow-other-keys)
+ (let* ((out (assoc-ref outputs "out"))
+(bin (string-append out "/bin")))
+   (for-each (lambda (file)
+   (install-file file bin))
+ (list "minced" "minced.jar"
+(native-inputs
+ `(("jdk", icedtea "jdk")))
+(inputs
+ `(("jre", icedtea)))

The commas should go after the space.



+(home-page"https://github.com/ctSkennerton/minced;)
+(synopsis "Mining CRISPRs in Environmental Datasets")
+(description
+ "MinCED is a program to find Clustered Regularly Interspaced Short
+Palindromic Repeats (CRISPRs) in full genomes or environmental datasets such
+as metagenomes, in which sequence size can be anywhere from 100 to 800 bp.")
That description which you took from the README is a little dated at the 
end. How about this?


"MinCED is a program to find Clustered Regularly Interspaced Short
Palindromic Repeats (CRISPRs) in both full genomes and shorter metagenomic 
sequences."

The rest LGTM. Thanks. Can you send an updated patch please?

ben


[PATCH] gnu: Add minced.

2016-08-15 Thread Marius Bakke
>From 017a593d407a36ca98736b95b7413f180a7735d4 Mon Sep 17 00:00:00 2001
From: Marius Bakke <mba...@fastmail.com>
Date: Mon, 15 Aug 2016 16:06:37 +0100
Subject: [PATCH] gnu: Add minced.

* gnu/packages/bioinformatics.scm (minced): New variable.
---
 gnu/packages/bioinformatics.scm | 51 +
 1 file changed, 51 insertions(+)

diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index a3f0d81..e76dc4d 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -5,6 +5,7 @@
 ;;; Copyright © 2015 Andreas Enge <andr...@enge.fr>
 ;;; Copyright © 2016 Roel Janssen <r...@gnu.org>
 ;;; Copyright © 2016 Efraim Flashner <efr...@flashner.co.il>
+;;; Copyright © 2016 Marius Bakke <mba...@fastmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -3117,6 +3118,56 @@ probabilistic distances of genome abundance and tetranucleotide frequency.")
(license (license:non-copyleft "file://license.txt"
   "See license.txt in the distribution."
 
+(define-public minced
+  (package
+(name "minced")
+(version "0.2.0")
+(source (origin
+  (method url-fetch)
+  (uri (string-append
+"https://github.com/ctSkennerton/minced/archive/;
+version ".tar.gz"))
+  (file-name (string-append name "-" version ".tar.gz"))
+  (sha256
+   (base32
+"0wxmlsapxfpxfd3ps9636h7i2xy6la8i42mwh0j2lsky63h63jp1"
+(build-system gnu-build-system)
+(arguments
+ `(#:test-target "test"
+   #:phases
+   (modify-phases %standard-phases
+ (delete 'configure)
+ (add-before 'check 'fix-test
+   (lambda _
+ ;; Fix test for latest version.
+ (substitute* "t/Aquifex_aeolicus_VF5.expected"
+   (("minced:0.1.6") "minced:0.2.0"
+ (add-before 'install 'qualify-java-path
+   (lambda* (#:key inputs #:allow-other-keys)
+ (substitute* "minced"
+   ;; Set full path to java binary in wrapper script.
+   (("^java") (string-append (assoc-ref inputs "jre")
+ "/bin/java")
+ (replace 'install
+   ;; No install target.
+   (lambda* (#:key outputs #:allow-other-keys)
+ (let* ((out (assoc-ref outputs "out"))
+(bin (string-append out "/bin")))
+   (for-each (lambda (file)
+   (install-file file bin))
+ (list "minced" "minced.jar"
+(native-inputs
+ `(("jdk", icedtea "jdk")))
+(inputs
+ `(("jre", icedtea)))
+(home-page "https://github.com/ctSkennerton/minced;)
+(synopsis "Mining CRISPRs in Environmental Datasets")
+(description
+ "MinCED is a program to find Clustered Regularly Interspaced Short
+Palindromic Repeats (CRISPRs) in full genomes or environmental datasets such
+as metagenomes, in which sequence size can be anywhere from 100 to 800 bp.")
+(license license:gpl3+)))
+
 (define-public miso
   (package
 (name "miso")
-- 
2.9.2