Hi Andreas, I meant the latest version of the repository fit with the tutorial on the repository. If you used the older version (old names), I am afraid users will have some problems when following the tutorial. Regarding the separation code and data issue, I will discuss with Nicola next Monday and let you know.
Thanks, Tin On Fri, Aug 5, 2016 at 10:21 PM Andreas Tille <andr...@an3as.eu> wrote: > Hi Tin, > > I need to admit that I can not parse the information you gave in your mail. > > It is also not really connected to my next mail (which is archived here > https://lists.debian.org/debian-med/2016/08/msg00040.html ) about the > separation of code and data. > > Kind regards > > Andreas. > > On Thu, Aug 04, 2016 at 11:48:08AM +0000, Duy Tin Truong wrote: > > Hi Andreas, > > > > If you can use the latest version with the name changes as I mentioned, > it > > would fit better with the updated tutorial on the metaphlan2 repository. > > > > Thanks, > > Tin > > > > On Thu, Aug 4, 2016 at 1:28 PM Nicola Segata <nicola.seg...@unitn.it> > wrote: > > > > > Hi Andreas, > > > yes, it is likely that the code will be frequently updated, but the > big > > > database file will change only rarely (for sure no more frequently than > > > once a year). > > > thanks > > > Nicola > > > > > > On Thu, Aug 4, 2016 at 12:46 PM Andreas Tille <andr...@an3as.eu> > wrote: > > > > > >> Hi again, > > >> > > >> On Thu, Aug 04, 2016 at 08:10:29AM +0000, Nicola Segata wrote: > > >> > Makes sense to me! > > >> > > >> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833388#15 > > >> > > >> If you read the discussion it seems that my suggestion to ship the > fasta > > >> file inside the Debian package and let the postinst do the > > >> transformation step found some agreement - provided that there are no > > >> frequent changes in the package and several uploads per month will > > >> happen. > > >> > > >> I'm now wondering what your estimated change rate for the metaphlan2 > > >> data files might be. Do these change frequently? Is there any chance > > >> that the code changes frequently but the data files stay unchanged? > > >> > > >> Kind regards > > >> > > >> Andreas. > > >> > > >> > On Thu, Aug 4, 2016 at 8:18 AM Andreas Tille <andr...@an3as.eu> > wrote: > > >> > > > >> > > Hi Nicola, > > >> > > > > >> > > On Wed, Aug 03, 2016 at 08:51:33PM +0000, Nicola Segata wrote: > > >> > > > Great, thanks Andreas. We provide the "*.bt2" files so that the > > >> user can > > >> > > > run BowTie2 internally to MetaPhlAn directly without first > building > > >> the > > >> > > > indexes (it will take quite a bit of time). > > >> > > > > >> > > Fully agreed here. > > >> > > > > >> > > > Also, the indexes are smaller > > >> > > > in size than the sequence file... > > >> > > > > >> > > Hmmm, all *.bt2 files sum up to 1,124,449kB while the fasta file > has > > >> > > only 753081kB. Considering the better compression performance of > pure > > >> > > text files a compressed archive containing the fasta is > drastically > > >> > > smaller than one with the *.bt2 files. Yesterday I tried to > start a > > >> > > discussion how to deal with the size of the data inside Debian[1] > (no > > >> > > answer so far) and my experiment to create a source tarball just > > >> > > containing the fasta resulted in a 270MB *xz* compressed file > (well xz > > >> > > is better than gz but lets say the compressed tarball with the > fasta > > >> is > > >> > > about 30% of size of your current download of 1.017MB. > > >> > > > > >> > > The situation for Debian is different than from your users: A > user > > >> who > > >> > > downloads from your website intends to run metaphlan2. Amongst > the > > >> > > millions of Debian users only very few are interested in > metaphlan2 > > >> and > > >> > > we need to outweight how much resources we could spent. Its not > that > > >> > > only Debian provides resources. There is a large mirroring > network > > >> that > > >> > > spents lots of bandwidth and disk space for a very small usage. > So in > > >> > > this case it makes sense to put the effort on the users side to > > >> > > regenerate the indexes (or even download the data separately via a > > >> > > script we could provide inside the package). So I could imagine > to > > >> > > package only the metaphlan2 code and provide a script that > downloads > > >> the > > >> > > data and puts them into the expected place. > > >> > > > > >> > > Kind regards > > >> > > > > >> > > Andreas. > > >> > > > > >> > > [1] > > >> > > > > >> > https://lists.alioth.debian.org/pipermail/debian-med-packaging/2016-August/044984.html > > >> > > > > >> > > > cheers > > >> > > > Nicola > > >> > > > > > >> > > > On Wed, Aug 3, 2016 at 6:08 PM Andreas Tille <andr...@an3as.eu> > > >> wrote: > > >> > > > > > >> > > > > Hi Tin, > > >> > > > > > > >> > > > > On Wed, Aug 03, 2016 at 02:01:01PM +0000, Duy Tin Truong > wrote: > > >> > > > > > > - Tin can also provide more info about the binary data in > > >> db_v20. > > >> > > The > > >> > > > > files > > >> > > > > > > ending with "bt2" are created using a script in the > Bowtie2 > > >> package > > >> > > > > > > (bowtie2-build) using a sequence file Tin can provide (it > can > > >> also > > >> > > be > > >> > > > > > > recovered from the bt2 files with bowtie2-inspect if I > > >> remember > > >> > > well). > > >> > > > > > As Nicola said, those files in db_v20 are created with > > >> bowtie2-build > > >> > > > > > using a sequence file and you can recover the sequence file > by: > > >> > > > > > > > >> > > > > > bowtie2-inspect metaphlan2/db_v20/mpa_v20_m200 > > > >> > > metaphlan2/markers.fasta > > >> > > > > > > > >> > > > > > If you want to rebuild them, the command is: > > >> > > > > > > > >> > > > > > bowtie2-build metaphlan2/markers.fasta > > >> metaphlan2/db_v21/mpa_v21_m200 > > >> > > > > > > >> > > > > I can confirm that I can reproduce the files byte identical > from > > >> > > > > markers.fasta. Is there any reason to ship the binary form > > >> instead of > > >> > > > > the fasta text file? Moreover, what is the source of the > > >> > > markers.fasta? > > >> > > > > Is there any related publication or so? > > >> > > > > > > >> > > > > > > For the mpa_v20_m200.pkl Tin can also provide the > uncompressed > > >> > > python > > >> > > > > > > object (or he can provide a couple of lines of code to > > >> uncompress > > >> > > it?) > > >> > > > > > It is python dictionary and can be read as: > > >> > > > > > > > >> > > > > > import cPickle as pickleimport bz2 > > >> > > > > > db = pickle.load(bz2.BZ2File('db_v20/mpa_v20_m200.pkl', > 'r')) > > >> > > > > > > > >> > > > > > You can have more information about them at: > > >> > > > > > > > >> > > > > > > >> > > > > >> > https://bitbucket.org/biobakery/metaphlan2#markdown-header-customizing-the-database > > >> > > > > > > >> > > > > OK, that page clarifies the method. Just a personal remark > from > > >> the > > >> > > > > point of view of an outsider of bioinformatics: I'd regard > the > > >> > > creation > > >> > > > > process of the mpa_v20_m200.pkl file a bit cumbersome. I'd > > >> personally > > >> > > > > prefer droping some text record somewhere and call a script > > >> processing > > >> > > > > this record rather than writing an own script. > > >> > > > > > > >> > > > > > In addition, some files were changed the names: > > >> > > > > > - metaphlan2_strainer.py -> strainphlan.py > > >> > > > > > - strainer_src -> strainphlan_src > > >> > > > > > - strainer_tutorial -> strainphlan_tutorial > > >> > > > > > > > >> > > > > > Some source files were updated as well. > > >> > > > > > Please let me know if you need other information. > > >> > > > > > > >> > > > > Just drop me a not once you might release a new version > containing > > >> > > these > > >> > > > > changes. I think I'll try to release the current version as > is > > >> since > > >> > > at > > >> > > > > least the origin of the files is clarified now. I'm not yet > sure > > >> > > whether > > >> > > > > the size of the data is acceptable or might spoil some limit. > > >> > > Regarding > > >> > > > > this I'm wondering whether I create a source tarball including > > >> rather > > >> > > > > markers.fasta and create the bt2 files in the build process. > > >> > > > > > > >> > > > > Kind regards > > >> > > > > > > >> > > > > Andreas. > > >> > > > > > > >> > > > > -- > > >> > > > > http://fam-tille.de > > >> > > > > > > >> > > > > >> > > -- > > >> > > http://fam-tille.de > > >> > > > > >> > > >> -- > > >> http://fam-tille.de > > >> > > > > > -- > http://fam-tille.de >