Bug#933661: metaphlan2: Port to Python3 needed

Steve Langasek Thu, 15 Aug 2019 17:15:45 -0700

Package: metaphlan2
Followup-For: Bug #933661
User: ubuntu-de...@lists.ubuntu.com
Usertags: origin-ubuntu eoan ubuntu-patch


Hi Andreas,

Prompted by an interest in dropping python-pandas from Ubuntu rather than
fixing its failing tests, I took a look at moving metaphlan2 to python3. 
Using the upstream 2.9.19 release
(https://bitbucket.org/biobakery/metaphlan2/get/2.9.19.tar.bz2) this is
fairly straightforward, however I find that 2.9 wants the databases in a
different format than are made available in the metaphlan2-data package; it
needs the new
https://bitbucket.org/biobakery/metaphlan2/downloads/mpa_v29_CHOCOPhlAn_201901.tar
database file instead of the current v20 file.

Since this data tarball will need repacking and some changes to the postinst
script (e.g. new metaphlan2 wants the database under
/usr/share/metaphlan2/metaphlan_databases instead of
/usr/share/metaphlan2/db_v20; and the input is no longer a fasta file but a
.fna.bz2), at least for now I'm not going to upload this change to Ubuntu.
But I'm attaching the debdiff with my work in progress for the metaphlan2
package, for your consideration.

Cheers,
-- 
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                   https://www.debian.org/
slanga...@ubuntu.com                                     vor...@debian.org

diff -Nru metaphlan2-2.7.8/debian/control metaphlan2-2.9.19/debian/control
--- metaphlan2-2.7.8/debian/control     2018-09-17 01:17:22.000000000 -0700
+++ metaphlan2-2.9.19/debian/control    2019-08-15 13:12:57.000000000 -0700
@@ -4,7 +4,7 @@
 Section: science
 Priority: optional
 Build-Depends: debhelper (>= 11~),
-               python-all,
+               python3-all,
                dh-python,
                pandoc,
                bowtie2
@@ -15,12 +15,12 @@
 
 Package: metaphlan2
 Architecture: all
-Depends: ${python:Depends},
+Depends: ${python3:Depends},
          ${misc:Depends},
          metaphlan2-data,
-         python-biom-format,
-         python-msgpack,
-         python-pandas,
+         python3-biom-format,
+         python3-msgpack,
+         python3-pandas,
          bowtie2
 Description: Metagenomic Phylogenetic Analysis
  MetaPhlAn is a computational tool for profiling the composition of
diff -Nru metaphlan2-2.7.8/debian/patches/_metaphlan2.py.patch 
metaphlan2-2.9.19/debian/patches/_metaphlan2.py.patch
--- metaphlan2-2.7.8/debian/patches/_metaphlan2.py.patch        2018-09-17 
01:17:22.000000000 -0700
+++ metaphlan2-2.9.19/debian/patches/_metaphlan2.py.patch       2019-08-15 
13:12:57.000000000 -0700
@@ -9,8 +9,10 @@
  support function annotations: These are optional in Python 3, and are
  removed from the function definitions in "_metaphlan2.py" by the patch.
 
---- a/_metaphlan2.py
-+++ b/_metaphlan2.py
+Index: metaphlan2-2.9.19/_metaphlan2.py
+===================================================================
+--- metaphlan2-2.9.19.orig/_metaphlan2.py
++++ metaphlan2-2.9.19/_metaphlan2.py
 @@ -3,7 +3,7 @@
  # This module defines the functions which run MetaPhlAn2 on
  # single and paired fastq data.
@@ -20,8 +22,8 @@
  import subprocess as sb
  from q2_types.per_sample_sequences import 
SingleLanePerSampleSingleEndFastqDirFmt
  from q2_types.per_sample_sequences import 
SingleLanePerSamplePairedEndFastqDirFmt
-@@ -24,8 +24,7 @@ def metaphlan2_helper(raw_data, nproc, i
-     sb.run(cmd, check=True)
+@@ -30,8 +30,7 @@
+           'doi: https://doi.org/10.1038/nmeth.3589', end='\n\n')
  
  
 -def profile_single_fastq(raw_data: SingleLanePerSampleSingleEndFastqDirFmt,
@@ -30,7 +32,7 @@
      output_biom = None
  
      with tempfile.TemporaryDirectory() as tmp_dir:
-@@ -36,8 +35,7 @@ def profile_single_fastq(raw_data: Singl
+@@ -42,8 +41,7 @@
      return output_biom
  
  
diff -Nru metaphlan2-2.7.8/debian/patches/mpa_dir-is-usr_share_metaphlan2.patch 
metaphlan2-2.9.19/debian/patches/mpa_dir-is-usr_share_metaphlan2.patch
--- metaphlan2-2.7.8/debian/patches/mpa_dir-is-usr_share_metaphlan2.patch       
2018-09-17 01:17:22.000000000 -0700
+++ metaphlan2-2.9.19/debian/patches/mpa_dir-is-usr_share_metaphlan2.patch      
2019-08-15 13:12:57.000000000 -0700
@@ -5,182 +5,222 @@
  .
  The doc is also adapted to this change.
 
---- a/metaphlan2.py
-+++ b/metaphlan2.py
-@@ -417,7 +417,7 @@ def read_params(args):
+Index: metaphlan2-2.9.19/metaphlan2.py
+===================================================================
+--- metaphlan2-2.9.19.orig/metaphlan2.py
++++ metaphlan2-2.9.19/metaphlan2.py
+@@ -143,7 +143,7 @@
  
              "*  You can also provide an externally BowTie2-mapped SAM if you 
specify this format with \n"
              "   --input_type. Two steps: first apply BowTie2 and then feed 
MetaPhlAn2 with the obtained sam:\n"
--            "$ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S 
metagenome.sam -x ${mpa_dir}/db_v20/mpa_v20_m200 -U metagenome.fastq\n"
-+            "$ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S 
metagenome.sam -x /usr/share/metaphlan2/db_v20/mpa_v20_m200 -U 
metagenome.fastq\n"
-             "$ metaphlan2.py metagenome.sam --input_type sam > 
profiled_metagenome.txt\n\n"
+-            "$ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S 
metagenome.sam -x ${mpa_dir}/metaphlan_databases/mpa_v25_CHOCOPhlAn_201901 -U 
metagenome.fastq\n"
++            "$ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S 
metagenome.sam -x 
/usr/share/metaphlan2/metaphlan_databases/mpa_v25_CHOCOPhlAn_201901 -U 
metagenome.fastq\n"
+             "$ metaphlan2.py metagenome.sam --input_type sam -o 
profiled_metagenome.txt\n\n"
  
-             # "*  Multiple alternative ways to pass the input are also 
available:\n"
-@@ -1391,7 +1391,7 @@ def metaphlan2():
+             "*  We can also natively handle paired-end metagenomes, and, more 
generally, metagenomes stored in \n"
+@@ -1154,7 +1154,7 @@
      # check for the mpa_pkl file
      if not os.path.isfile(pars['mpa_pkl']):
          sys.stderr.write("Error: Unable to find the mpa_pkl file at: " + 
pars['mpa_pkl'] +
 -                         "\nExpecting location 
${mpa_dir}/db_v20/map_v20_m200.pkl "
-+                         "\nExpecting location 
/usr/share/metaphlan2/db_v20/mpa_v20_m200.pkl "
-                          "\nSelect the file location with the option 
--mpa_pkl.\n"
++                         "\nExpecting location 
/usr/share/metaphlan2/db_v20/map_v20_m200.pkl "
                           "Exiting...\n\n")
          sys.exit(1)
---- a/README.md
-+++ b/README.md
-@@ -86,33 +86,27 @@ In case you moved the `metaphlan2.py` sc
- 
- This section presents some basic usages of MetaPhlAn2, for more advanced 
usages, please see at [its 
wiki](https://bitbucket.org/biobakery/biobakery/wiki/metaphlan2).
- 
--We assume here that ``metaphlan2.py`` is in the system path and that 
``mpa_dir`` bash variable contains the main MetaPhlAn folder. You can set this 
two variables moving to your MetaPhlAn2 local folder and type:
--
--```
--#!bash
--$ export PATH=`pwd`:$PATH
--$ export mpa_dir=`pwd`
--```
-+We assume here that ``metaphlan2`` is in the system path.
- 
- Here is the basic example to profile a metagenome from raw reads (requires 
BowTie2 in the system path with execution and read permissions, Perl 
installed). 
- 
- ```
- #!bash
--$ metaphlan2.py metagenome.fastq --input_type fastq > profiled_metagenome.txt
-+$ metaphlan2 metagenome.fastq --input_type fastq > profiled_metagenome.txt
- ```
- 
- It is highly recommended to save the intermediate BowTie2 output for 
re-running MetaPhlAn extremely quickly (`--bowtie2out`), and use multiple CPUs 
(`--nproc`) if available:
- 
- ```
- #!bash
--$ metaphlan2.py metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 
5 --input_type fastq > profiled_metagenome.txt
-+$ metaphlan2 metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 
--input_type fastq > profiled_metagenome.txt
- ```
- 
- If you already mapped your metagenome against the marker DB (using a previous 
 MetaPhlAn run), you can obtain the results in few seconds by using the 
previously saved `--bowtie2out` file and specifying the input (`--input_type 
bowtie2out`):
- 
- ```
- #!bash
--$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out > 
profiled_metagenome.txt
-+$ metaphlan2 metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out > 
profiled_metagenome.txt
- ```
- 
- You can also provide an externally BowTie2-mapped SAM if you specify this 
format with `--input_type`. Two steps here: first map your metagenome with 
BowTie2 and then feed MetaPhlAn2 with the obtained sam:
-@@ -120,14 +114,14 @@ You can also provide an externally BowTi
- ```
- #!bash
- $ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S 
metagenome.sam -x databases/mpa_v20_m200 -U metagenome.fastq
--$ metaphlan2.py metagenome.sam --input_type sam > profiled_metagenome.txt
-+$ metaphlan2 metagenome.sam --input_type sam > profiled_metagenome.txt
- ```
- 
- MetaPhlAn 2 can also natively **handle paired-end metagenomes** (but does not 
use the paired-end information), and, more generally, metagenomes stored in 
multiple files (but you need to specify the --bowtie2out parameter):
- 
- ```
- #!bash
--$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out 
metagenome.bowtie2.bz2 --nproc 5 --input_type fastq > profiled_metagenome.txt
-+$ metaphlan2 metagenome_1.fastq,metagenome_2.fastq --bowtie2out 
metagenome.bowtie2.bz2 --nproc 5 --input_type fastq > profiled_metagenome.txt
- ```
- 
- For advanced options and other analysis types (such as strain tracking) 
please refer to the full command-line options.
-@@ -136,7 +130,7 @@ For advanced options and other analysis
- 
- 
- ```
--usage: metaphlan2.py --input_type
-+usage: metaphlan2 --input_type
-                      {fastq,fasta,multifasta,multifastq,bowtie2out,sam}
-                      [--mpa_pkl MPA_PKL] [--bowtie2db METAPHLAN_BOWTIE2_DB]
-                      [--bt2_ps BowTie2 presets] [--bowtie2_exe BOWTIE2_EXE]
-@@ -161,7 +155,7 @@ AUTHORS: Nicola Segata (nicola.segata@un
- 
- COMMON COMMANDS
- 
-- We assume here that metaphlan2.py is in the system path and that mpa_dir 
bash variable contains the
-+ We assume here that metaphlan2 is in the system path and that mpa_dir bash 
variable contains the
-  main MetaPhlAn folder. Also BowTie2 should be in the system path with 
execution and read
-  permissions, and Perl should be installed.
- 
-@@ -172,25 +166,25 @@ strains in particular cases) present in
- relative abundance. This correspond to the default analysis type 
(--analysis_type rel_ab).
- 
- *  Profiling a metagenome from raw reads:
--$ metaphlan2.py metagenome.fastq --input_type fastq
-+$ metaphlan2 metagenome.fastq --input_type fastq
- 
- *  You can take advantage of multiple CPUs and save the intermediate BowTie2 
output for re-running
-    MetaPhlAn extremely quickly:
--$ metaphlan2.py metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 
5 --input_type fastq
-+$ metaphlan2 metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 
--input_type fastq
- 
- *  If you already mapped your metagenome against the marker DB (using a 
previous MetaPhlAn run), you
-    can obtain the results in few seconds by using the previously saved 
--bowtie2out file and 
-    specifying the input (--input_type bowtie2out):
--$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out
-+$ metaphlan2 metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out
- 
- *  You can also provide an externally BowTie2-mapped SAM if you specify this 
format with 
-    --input_type. Two steps: first apply BowTie2 and then feed MetaPhlAn2 with 
the obtained sam:
- $ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S 
metagenome.sam -x databases/mpa_v20_m200 -U metagenome.fastq
--$ metaphlan2.py metagenome.sam --input_type sam > profiled_metagenome.txt
-+$ metaphlan2 metagenome.sam --input_type sam > profiled_metagenome.txt
- 
- *  We can also natively handle paired-end metagenomes, and, more generally, 
metagenomes stored in 
-   multiple files (but you need to specify the --bowtie2out parameter):
--$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out 
metagenome.bowtie2.bz2 --nproc 5 --input_type fastq
-+$ metaphlan2 metagenome_1.fastq,metagenome_2.fastq --bowtie2out 
metagenome.bowtie2.bz2 --nproc 5 --input_type fastq
- 
- ------------------------------------------------------------------- 
-  
-@@ -208,23 +202,23 @@ file saved during the execution of the d
- *  The following command will output the abundance of each marker with a RPK 
(reads per kil-base) 
-    higher 0.0. (we are assuming that metagenome_outfmt.bz2 has been generated 
before as 
-    shown above).
--$ metaphlan2.py -t marker_ab_table metagenome_outfmt.bz2 --input_type 
bowtie2out > marker_abundance_table.txt
-+$ metaphlan2 -t marker_ab_table metagenome_outfmt.bz2 --input_type bowtie2out 
> marker_abundance_table.txt
-    The obtained RPK can be optionally normalized by the total number of reads 
in the metagenome 
-    to guarantee fair comparisons of abundances across samples. The number of 
reads in the metagenome
-    needs to be passed with the '--nreads' argument
- 
- *  The list of markers present in the sample can be obtained with '-t 
marker_pres_table'
--$ metaphlan2.py -t marker_pres_table metagenome_outfmt.bz2 --input_type 
bowtie2out > marker_abundance_table.txt
-+$ metaphlan2 -t marker_pres_table metagenome_outfmt.bz2 --input_type 
bowtie2out > marker_abundance_table.txt
-    The --pres_th argument (default 1.0) set the minimum RPK value to consider 
a marker present
- 
- *  The list '-t clade_profiles' analysis type reports the same information of 
'-t marker_ab_table'
-    but the markers are reported on a clade-by-clade basis.
--$ metaphlan2.py -t clade_profiles metagenome_outfmt.bz2 --input_type 
bowtie2out > marker_abundance_table.txt
-+$ metaphlan2 -t clade_profiles metagenome_outfmt.bz2 --input_type bowtie2out 
> marker_abundance_table.txt
- 
- *  Finally, to obtain all markers present for a specific clade and all its 
subclades, the 
-    '-t clade_specific_strain_tracker' should be used. For example, the 
following command
-    is reporting the presence/absence of the markers for the B. fragulis 
species and its strains
--$ metaphlan2.py -t clade_specific_strain_tracker --clade 
s__Bacteroides_fragilis metagenome_outfmt.bz2 databases/mpa_v20_m200.pkl 
--input_type bowtie2out > marker_abundance_table.txt
-+$ metaphlan2 -t clade_specific_strain_tracker --clade s__Bacteroides_fragilis 
metagenome_outfmt.bz2 databases/mpa_v20_m200.pkl --input_type bowtie2out > 
marker_abundance_table.txt
-    the optional argument --min_ab specifies the minimum clade abundance for 
reporting the markers
- 
- ------------------------------------------------------------------- 
-@@ -521,7 +515,7 @@ pickle.dump(db, ofile, pickle.HIGHEST_PR
- ofile.close()
- ```
- 
--* To use the new database, switch to metaphlan2/db_v21 instead of 
metaphlan2/db\_v20 when running metaphlan2.py with option "--mpa\_pkl".
-+* To use the new database, switch to metaphlan2/db_v21 instead of 
metaphlan2/db\_v20 when running metaphlan2 with option "--mpa\_pkl".
- 
- 
- ## Metagenomic strain-level population genomics
-@@ -591,7 +585,7 @@ for f in $(ls fastqs/*.bz2)
- do
-     echo "Running metaphlan2 on ${f}"
-     bn=$(basename ${f} | cut -d . -f 1)
--    tar xjfO ${f} | ../metaphlan2.py --bowtie2db ../databases/mpa_v20_m200 
--mpa_pkl ../databases/mpa_v20_m200.pkl --input_type multifastq --nproc 10 -s 
sams/${bn}.sam.bz2 --bowtie2out sams/${bn}.bowtie2_out.bz2 -o sams/${bn}.profile
-+    tar xjfO ${f} | metaphlan2 --bowtie2db 
/usr/share/metaphlan2/db_v20/mpa_v20_m200 --mpa_pkl 
/usr/share/metaphlan2/db_v20/mpa_v20_m200.pkl --input_type multifastq --nproc 
10 -s sams/${bn}.sam.bz2 --bowtie2out sams/${bn}.bowtie2_out.bz2 -o 
sams/${bn}.profile
- done
- ```
- 
-@@ -731,4 +725,4 @@ In the output folder, you can find the f
- 1. clade_name.fasta: the alignment file of all metagenomic strains.
- 3. *.marker_pos: this file shows the starting position of each marker in the 
strains.
- 3. *.info: this file shows the general information like the total length of 
the concatenated markers (full sequence length), number of used markers, etc.
+ 
+Index: metaphlan2-2.9.19/README.md
+===================================================================
+--- metaphlan2-2.9.19.orig/README.md
++++ metaphlan2-2.9.19/README.md
+@@ -107,14 +107,14 @@
+ 
+ ```
+ #!bash
+-$ metaphlan2.py --install 
++$ metaphlan2 --install 
+ ```
+ 
+ By default, the latest MetaPhlAn2 database is downloaded and built. You can 
download a specific version with the `--index` parameter
+ 
+ ```
+ #!bash
+-$ metaphlan2.py --install --index v29_CHOCOPhlAn_201901
++$ metaphlan2 --install --index v29_CHOCOPhlAn_201901
+ ```
+ 
+ --------------------------
+@@ -123,19 +123,13 @@
+ 
+ This section presents some basic usages of MetaPhlAn2, for more advanced 
usages, please see at [its 
wiki](https://bitbucket.org/biobakery/biobakery/wiki/metaphlan2).
+ 
+-We assume here that ``metaphlan2.py`` is in the system path and that 
``mpa_dir`` bash variable contains the main MetaPhlAn folder. You can set this 
two variables moving to your MetaPhlAn2 local folder and type:
+-
+-```
+-#!bash
+-$ export PATH=`pwd`:$PATH
+-$ export mpa_dir=`pwd`
+-```
++We assume here that ``metaphlan2`` is in the system path.
+ 
+ Here is the basic example to profile a metagenome from raw reads (requires 
BowTie2 in the system path with execution and read permissions, Perl 
installed). 
+ 
+ ```
+ #!bash
+-$ metaphlan2.py metagenome.fastq --input_type fastq -o profiled_metagenome.txt
++$ metaphlan2 metagenome.fastq --input_type fastq -o profiled_metagenome.txt
+ ```
+ 
+ ### Starting from version 2.9, MetaPhlAn2 estimates the fraction of the 
metagenome composed by microbes that are unknown. The relative abundance 
profile is scaled according the percentage of reads mapping to a known clade. 
+@@ -146,7 +140,7 @@
+ 
+ ```
+ #!bash
+-$ metaphlan2.py metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 
5 --input_type fastq -o profiled_metagenome.txt
++$ metaphlan2 metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 
--input_type fastq -o profiled_metagenome.txt
+ ```
+ 
+ 
+@@ -154,7 +148,7 @@
+ 
+ ```
+ #!bash
+-$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out -o 
profiled_metagenome.txt
++$ metaphlan2 metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out -o 
profiled_metagenome.txt
+ ```
+ 
+ `bowtie2out` files generated with MetaPhlAn2 versions below 2.9 are not 
compatibile. Starting from MetaPhlAn2 2.9, the BowTie2 ouput now includes the 
size of the profiled metagenome.
+@@ -162,7 +156,7 @@
+ 
+ ```
+ #!bash
+-$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out 
--nreads 520000 -o profiled_metagenome.txt
++$ metaphlan2 metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out 
--nreads 520000 -o profiled_metagenome.txt
+ ```
+ 
+ You can also provide an externally BowTie2-mapped SAM if you specify this 
format with `--input_type`. Two steps here: first map your metagenome with 
BowTie2 and then feed MetaPhlAn2 with the obtained sam:
+@@ -170,14 +164,14 @@
+ ```
+ #!bash
+ $ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S 
metagenome.sam -x metaphlan_databases/mpa_v29_CHOCOPhlAn_201901 -U 
metagenome.fastq
+-$ metaphlan2.py metagenome.sam --input_type sam -o profiled_metagenome.txt
++$ metaphlan2 metagenome.sam --input_type sam -o profiled_metagenome.txt
+ ```
+ 
+ MetaPhlAn 2 can also natively **handle paired-end metagenomes** (but does not 
use the paired-end information), and, more generally, metagenomes stored in 
multiple files (but you need to specify the --bowtie2out parameter):
+ 
+ ```
+ #!bash
+-$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out 
metagenome.bowtie2.bz2 --nproc 5 --input_type fastq -o profiled_metagenome.txt
++$ metaphlan2 metagenome_1.fastq,metagenome_2.fastq --bowtie2out 
metagenome.bowtie2.bz2 --nproc 5 --input_type fastq -o profiled_metagenome.txt
+ ```
+ 
+ You can provide the specific database version with `--index`. 
+@@ -189,7 +183,7 @@
+ 
+ ## Full command-line options
+ ```
+-usage: metaphlan2.py --input_type
++usage: metaphlan2 --input_type
+                      {fastq,fasta,multifasta,multifastq,bowtie2out,sam}
+                      [--mpa_pkl MPA_PKL] [--force]
+                      [--bowtie2db METAPHLAN_BOWTIE2_DB] [-x INDEX]
+@@ -219,7 +213,7 @@
+ 
+ COMMON COMMANDS
+ 
+- We assume here that metaphlan2.py is in the system path and that mpa_dir 
bash variable contains the
++ We assume here that metaphlan2 is in the system path and that mpa_dir bash 
variable contains the
+  main MetaPhlAn folder. Also BowTie2 should be in the system path with 
execution and read
+  permissions, and Perl should be installed)
+ 
+@@ -230,30 +224,30 @@
+ relative abundance. This correspond to the default analysis type (-t rel_ab).
+ 
+ *  Profiling a metagenome from raw reads:
+-$ metaphlan2.py metagenome.fastq --input_type fastq -o profiled_metagenome.txt
++$ metaphlan2 metagenome.fastq --input_type fastq -o profiled_metagenome.txt
+ 
+ *  You can take advantage of multiple CPUs and save the intermediate BowTie2 
output for re-running
+    MetaPhlAn extremely quickly:
+-$ metaphlan2.py metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 
5 --input_type fastq -o profiled_metagenome.txt
++$ metaphlan2 metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 
--input_type fastq -o profiled_metagenome.txt
+ 
+ *  If you already mapped your metagenome against the marker DB (using a 
previous MetaPhlAn run), you
+    can obtain the results in few seconds by using the previously saved 
--bowtie2out file and
+    specifying the input (--input_type bowtie2out):
+-$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out -o 
profiled_metagenome.txt
++$ metaphlan2 metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out -o 
profiled_metagenome.txt
+ 
+ *  bowtie2out files generated with MetaPhlAn2 versions below 2.9 are not 
compatibile.
+    Starting from MetaPhlAn2 2.9, the BowTie2 ouput now includes the size of 
the profiled metagenome.
+    If you want to re-run MetaPhlAn2 using these file you should provide the 
metagenome size via --nreads:
+-$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out 
--nreads 520000 -o profiled_metagenome.txt
++$ metaphlan2 metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out 
--nreads 520000 -o profiled_metagenome.txt
+ 
+ *  You can also provide an externally BowTie2-mapped SAM if you specify this 
format with
+    --input_type. Two steps: first apply BowTie2 and then feed MetaPhlAn2 with 
the obtained sam:
+ $ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S 
metagenome.sam -x ${mpa_dir}/metaphlan_databases/mpa_v25_CHOCOPhlAn_201901 -U 
metagenome.fastq
+-$ metaphlan2.py metagenome.sam --input_type sam -o profiled_metagenome.txt
++$ metaphlan2 metagenome.sam --input_type sam -o profiled_metagenome.txt
+ 
+ *  We can also natively handle paired-end metagenomes, and, more generally, 
metagenomes stored in
+   multiple files (but you need to specify the --bowtie2out parameter):
+-$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out 
metagenome.bowtie2.bz2 --nproc 5 --input_type fastq
++$ metaphlan2 metagenome_1.fastq,metagenome_2.fastq --bowtie2out 
metagenome.bowtie2.bz2 --nproc 5 --input_type fastq
+ 
+ -------------------------------------------------------------------
+ 
+@@ -271,25 +265,25 @@
+ *  The following command will output the abundance of each marker with a RPK 
(reads per kilo-base)
+    higher 0.0. (we are assuming that metagenome_outfmt.bz2 has been generated 
before as
+    shown above).
+-$ metaphlan2.py -t marker_ab_table metagenome_outfmt.bz2 --input_type 
bowtie2out -o marker_abundance_table.txt
++$ metaphlan2 -t marker_ab_table metagenome_outfmt.bz2 --input_type bowtie2out 
-o marker_abundance_table.txt
+    The obtained RPK can be optionally normalized by the total number of reads 
in the metagenome
+    to guarantee fair comparisons of abundances across samples. The number of 
reads in the metagenome
+    needs to be passed with the '--nreads' argument
+ 
+ *  The list of markers present in the sample can be obtained with '-t 
marker_pres_table'
+-$ metaphlan2.py -t marker_pres_table metagenome_outfmt.bz2 --input_type 
bowtie2out -o marker_abundance_table.txt
++$ metaphlan2 -t marker_pres_table metagenome_outfmt.bz2 --input_type 
bowtie2out -o marker_abundance_table.txt
+    The --pres_th argument (default 1.0) set the minimum RPK value to consider 
a marker present
+ 
+ *  The list '-t clade_profiles' analysis type reports the same information of 
'-t marker_ab_table'
+    but the markers are reported on a clade-by-clade basis.
+-$ metaphlan2.py -t clade_profiles metagenome_outfmt.bz2 --input_type 
bowtie2out -o marker_abundance_table.txt
++$ metaphlan2 -t clade_profiles metagenome_outfmt.bz2 --input_type bowtie2out 
-o marker_abundance_table.txt
+ 
+ *  Finally, to obtain all markers present for a specific clade and all its 
subclades, the
+    '-t clade_specific_strain_tracker' should be used. For example, the 
following command
+    is reporting the presence/absence of the markers for the B. fragulis 
species and its strains
+    the optional argument --min_ab specifies the minimum clade abundance for 
reporting the markers
+ 
+-$ metaphlan2.py -t clade_specific_strain_tracker --clade 
s__Bacteroides_fragilis metagenome_outfmt.bz2 --input_type bowtie2out -o 
marker_abundance_table.txt
++$ metaphlan2 -t clade_specific_strain_tracker --clade s__Bacteroides_fragilis 
metagenome_outfmt.bz2 --input_type bowtie2out -o marker_abundance_table.txt
+ 
+ -------------------------------------------------------------------
+ 
+@@ -539,7 +533,7 @@
+ 
+ ```
+ 
+-* To use the new database, run metaphlan2.py with option "--index 
v25_CHOCOPhlAn_NEW".
++* To use the new database, run metaphlan2 with option "--index 
v25_CHOCOPhlAn_NEW".
+ 
+ 
+ ## Metagenomic strain-level population genomics
+@@ -611,7 +605,7 @@
+ do
+     echo "Running metaphlan2 on ${f}"
+     bn=$(basename ${f} | cut -d '.' -f 1)
+-     ../metaphlan2.py --index v29_CHOCOPhlAn_201901 --input_type multifastq 
--nproc 10s -s sams/${bn}.sam.bz2 --bowtie2out sams/${bn}.bowtie2_out.bz2 -o 
ssams/${bn}.profile ${f}
++     metaphlan2 --index v29_CHOCOPhlAn_201901 --input_type multifastq --nproc 
10s -s sams/${bn}.sam.bz2 --bowtie2out sams/${bn}.bowtie2_out.bz2 -o 
ssams/${bn}.profile ${f}
+ done
+ ```
+ 
+@@ -751,4 +745,4 @@
+ 1. clade_name.fasta: the alignment file of all metagenomic strains.
+ 3. *.marker_pos: this file shows the starting position of each marker in the 
strains.
+ 3. *.info: this file shows the general information like the total length of 
the concatenated markers (full sequence length), number of used markers, etc.
 -4. *.polymorphic: this file shows the statistics on the polymorphic site, 
where "sample" is the sample name, "percentage\_of\_polymorphic_sites" is the 
percentage of sites that are suspected to be polymorphic, "avg\_freq" is the 
average frequency of the dominant alleles on all polymorphic sites, 
"avg\_coverage" is the average coverage at all polymorphic sites.
 \ No newline at end of file
-+4. *.polymorphic: this file shows the statistics on the polymorphic site, 
where "sample" is the sample name, "percentage\_of\_polymorphic_sites" is the 
percentage of sites that are suspected to be polymorphic, "avg\_freq" is the 
average frequency of the dominant alleles on all polymorphic sites, 
"avg\_coverage" is the average coverage at all polymorphic sites.
++4. *.polymorphic: this file shows the statistics on the polymorphic site, 
where "sample" is the sample name, "percentage\_of\_polymorphic_sites" is the 
percentage of sites that are suspected to be polymorphic, "avg\_freq" is the 
average frequency of the dominant alleles on all polymorphic sites, 
"avg\_coverage" is the average coverage at all polymorphic sites.
diff -Nru metaphlan2-2.7.8/debian/patches/python3.patch 
metaphlan2-2.9.19/debian/patches/python3.patch
--- metaphlan2-2.7.8/debian/patches/python3.patch       1969-12-31 
16:00:00.000000000 -0800
+++ metaphlan2-2.9.19/debian/patches/python3.patch      2019-08-15 
13:12:57.000000000 -0700
@@ -0,0 +1,74 @@
+Description: set interpreter to python3
+Author: Steve Langasek <steve.langa...@ubuntu.com>
+Last-Modified: 2019-08-15
+
+Index: metaphlan2-2.9.19/metaphlan2.py
+===================================================================
+--- metaphlan2-2.9.19.orig/metaphlan2.py
++++ metaphlan2-2.9.19/metaphlan2.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/env python3
+ from __future__ import with_statement
+ __author__ = ('Nicola Segata (nicola.seg...@unitn.it), '
+               'Duy Tin Truong, '
+Index: metaphlan2-2.9.19/strainphlan.py
+===================================================================
+--- metaphlan2-2.9.19.orig/strainphlan.py
++++ metaphlan2-2.9.19/strainphlan.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/env python3
+ # Author: Duy Tin Truong (duytin.tru...@unitn.it)
+ #             at CIBIO, University of Trento, Italy
+ 
+Index: metaphlan2-2.9.19/utils/extract_markers.py
+===================================================================
+--- metaphlan2-2.9.19.orig/utils/extract_markers.py
++++ metaphlan2-2.9.19/utils/extract_markers.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/env python3
+ #Author: Duy Tin Truong (duytin.tru...@unitn.it)
+ #        at CIBIO, University of Trento, Italy
+ 
+Index: metaphlan2-2.9.19/utils/merge_metaphlan_tables.py
+===================================================================
+--- metaphlan2-2.9.19.orig/utils/merge_metaphlan_tables.py
++++ metaphlan2-2.9.19/utils/merge_metaphlan_tables.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/env python3
+ 
+ import argparse
+ import os
+Index: metaphlan2-2.9.19/utils/metaphlan2krona.py
+===================================================================
+--- metaphlan2-2.9.19.orig/utils/metaphlan2krona.py
++++ metaphlan2-2.9.19/utils/metaphlan2krona.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/env python3
+ 
+ # 
============================================================================== 
+ # Conversion script: from MetaPhlAn output to Krona text input file
+Index: metaphlan2-2.9.19/utils/plot_bug.py
+===================================================================
+--- metaphlan2-2.9.19.orig/utils/plot_bug.py
++++ metaphlan2-2.9.19/utils/plot_bug.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python2
++#!/usr/bin/env python3
+ 
+ import sys
+ import numpy as np
+Index: metaphlan2-2.9.19/utils/read_fastx.py
+===================================================================
+--- metaphlan2-2.9.19.orig/utils/read_fastx.py
++++ metaphlan2-2.9.19/utils/read_fastx.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/env python3
+ 
+ 
+ import sys
diff -Nru metaphlan2-2.7.8/debian/patches/series 
metaphlan2-2.9.19/debian/patches/series
--- metaphlan2-2.7.8/debian/patches/series      2018-09-17 01:17:22.000000000 
-0700
+++ metaphlan2-2.9.19/debian/patches/series     2019-08-15 13:12:57.000000000 
-0700
@@ -1,3 +1,4 @@
 mpa_dir-is-usr_share_metaphlan2.patch
 spelling.patch
 _metaphlan2.py.patch
+python3.patch
diff -Nru metaphlan2-2.7.8/debian/patches/spelling.patch 
metaphlan2-2.9.19/debian/patches/spelling.patch
--- metaphlan2-2.7.8/debian/patches/spelling.patch      2018-09-17 
01:17:22.000000000 -0700
+++ metaphlan2-2.9.19/debian/patches/spelling.patch     2019-08-15 
13:12:57.000000000 -0700
@@ -2,29 +2,33 @@
 Last-Update: Mon, 23 May 2016 16:09:13 +0200
 Description: Spelling
 
---- a/README.md
-+++ b/README.md
-@@ -307,7 +307,7 @@ Post-mapping arguments:
- Additional analysis types and arguments:
-   -t ANALYSIS TYPE      Type of analysis to perform: 
-                          * rel_ab: profiling a metagenomes in terms of 
relative abundances
--                         * rel_ab_w_read_stats: profiling a metagenomes in 
terms of relative abundances and estimate the number of reads comming from each 
clade.
-+                         * rel_ab_w_read_stats: profiling a metagenomes in 
terms of relative abundances and estimate the number of reads coming from each 
clade.
-                          * reads_map: mapping from reads to clades (only 
reads hitting a marker)
-                          * clade_profiles: normalized marker counts for 
clades with at least a non-null marker
-                          * marker_ab_table: normalized marker counts (only 
when > 0.0 and normalized by metagenome size if --nreads is specified)
-@@ -713,7 +713,7 @@ python ../strainphlan.py -h
- The default setting can be stringent for some cases where you have very few 
samples left in the phylogenetic tree. You can relax some parameters to add 
more samples back:
- 
- 1. *marker\_in\_clade*: In each sample, the clades with the percentage of 
present markers less than this threshold are removed. Default "0.8". You can 
set this parameter to "0.5" to add some more samples.
--2. *sample\_in\_marker*: If the percentage of samples that a marker present 
in is less than this threhold, that marker is removed. Default "0.8". You can 
set this parameter to "0.5" to add some more samples.
-+2. *sample\_in\_marker*: If the percentage of samples that a marker present 
in is less than this threshold, that marker is removed. Default "0.8". You can 
set this parameter to "0.5" to add some more samples.
- 3. *N\_in\_marker*: The consensus markers with the percentage of N 
nucleotides greater than this threshold are removed. Default "0.2". You can set 
this parameter to "0.5" to add some more samples.
- 4. *gap\_in\_sample*: The samples with full sequences concatenated from all 
markers and having the percentage of gaps greater than this threshold will be 
removed. Default 0.2. You can set this parameter to "0.5" to add some more 
samples.
- 5. *relaxed\_parameters*: use this option to automatically set the above 
parameters to add some more samples by accepting some more gaps, Ns, etc. This 
option is equivalent to set: marker\_in\_clade=0.5, sample\_in\_marker=0.5, 
N\_in\_marker=0.5, gap\_in\_sample=0.5. Default "False".
---- a/strainphlan.py
-+++ b/strainphlan.py
-@@ -337,7 +337,7 @@ def read_params():
+Index: metaphlan2-2.9.19/README.md
+===================================================================
+--- metaphlan2-2.9.19.orig/README.md
++++ metaphlan2-2.9.19/README.md
+@@ -375,7 +375,7 @@
+ Additional analysis types and arguments:
+   -t ANALYSIS TYPE      Type of analysis to perform:
+                          * rel_ab: profiling a metagenomes in terms of 
relative abundances
+-                         * rel_ab_w_read_stats: profiling a metagenomes in 
terms of relative abundances and estimate the number of reads comming from each 
clade.
++                         * rel_ab_w_read_stats: profiling a metagenomes in 
terms of relative abundances and estimate the number of reads coming from each 
clade.
+                          * reads_map: mapping from reads to clades (only 
reads hitting a marker)
+                          * clade_profiles: normalized marker counts for 
clades with at least a non-null marker
+                          * marker_ab_table: normalized marker counts (only 
when > 0.0 and normalized by metagenome size if --nreads is specified)
+@@ -733,7 +733,7 @@
+ The default setting can be stringent for some cases where you have very few 
samples left in the phylogenetic tree. You can relax some parameters to add 
more samples back:
+ 
+ 1. *marker\_in\_clade*: In each sample, the clades with the percentage of 
present markers less than this threshold are removed. Default "0.8". You can 
set this parameter to "0.5" to add some more samples.
+-2. *sample\_in\_marker*: If the percentage of samples that a marker present 
in is less than this threhold, that marker is removed. Default "0.8". You can 
set this parameter to "0.5" to add some more samples.
++2. *sample\_in\_marker*: If the percentage of samples that a marker present 
in is less than this threshold, that marker is removed. Default "0.8". You can 
set this parameter to "0.5" to add some more samples.
+ 3. *N\_in\_marker*: The consensus markers with the percentage of N 
nucleotides greater than this threshold are removed. Default "0.2". You can set 
this parameter to "0.5" to add some more samples.
+ 4. *gap\_in\_sample*: The samples with full sequences concatenated from all 
markers and having the percentage of gaps greater than this threshold will be 
removed. Default 0.2. You can set this parameter to "0.5" to add some more 
samples.
+ 5. *relaxed\_parameters*: use this option to automatically set the above 
parameters to add some more samples by accepting some more gaps, Ns, etc. This 
option is equivalent to set: marker\_in\_clade=0.5, sample\_in\_marker=0.5, 
N\_in\_marker=0.5, gap\_in\_sample=0.5. Default "False".
+Index: metaphlan2-2.9.19/strainphlan.py
+===================================================================
+--- metaphlan2-2.9.19.orig/strainphlan.py
++++ metaphlan2-2.9.19/strainphlan.py
+@@ -338,7 +338,7 @@
          required=False,
          default=['all'],
          type=str,
@@ -33,9 +37,11 @@
                  'the marker alignments in fasta format and the phylogenetic '\
                  'trees. If a file name is specified, the clade list in that '\
                  'file where each clade name is on a line will be read.'
---- a/metaphlan2.py
-+++ b/metaphlan2.py
-@@ -596,7 +596,7 @@ def read_params(args):
+Index: metaphlan2-2.9.19/metaphlan2.py
+===================================================================
+--- metaphlan2-2.9.19.orig/metaphlan2.py
++++ metaphlan2-2.9.19/metaphlan2.py
+@@ -314,7 +314,7 @@
           default='rel_ab', help =
           "Type of analysis to perform: \n"
           " * rel_ab: profiling a metagenomes in terms of relative 
abundances\n"
diff -Nru metaphlan2-2.7.8/debian/rules metaphlan2-2.9.19/debian/rules
--- metaphlan2-2.7.8/debian/rules       2018-09-17 01:17:22.000000000 -0700
+++ metaphlan2-2.9.19/debian/rules      2019-08-15 13:12:57.000000000 -0700
@@ -3,7 +3,7 @@
 # DH_VERBOSE := 1
 
 %:
-       dh $@  --with python2
+       dh $@  --with python3
 
 override_dh_auto_build:
        dh_auto_build

Bug#933661: metaphlan2: Port to Python3 needed

Reply via email to