Hi Tin, On Wed, Aug 03, 2016 at 02:01:01PM +0000, Duy Tin Truong wrote: > > - Tin can also provide more info about the binary data in db_v20. The files > > ending with "bt2" are created using a script in the Bowtie2 package > > (bowtie2-build) using a sequence file Tin can provide (it can also be > > recovered from the bt2 files with bowtie2-inspect if I remember well). > As Nicola said, those files in db_v20 are created with bowtie2-build > using a sequence file and you can recover the sequence file by: > > bowtie2-inspect metaphlan2/db_v20/mpa_v20_m200 > metaphlan2/markers.fasta > > If you want to rebuild them, the command is: > > bowtie2-build metaphlan2/markers.fasta metaphlan2/db_v21/mpa_v21_m200
I can confirm that I can reproduce the files byte identical from markers.fasta. Is there any reason to ship the binary form instead of the fasta text file? Moreover, what is the source of the markers.fasta? Is there any related publication or so? > > For the mpa_v20_m200.pkl Tin can also provide the uncompressed python > > object (or he can provide a couple of lines of code to uncompress it?) > It is python dictionary and can be read as: > > import cPickle as pickleimport bz2 > db = pickle.load(bz2.BZ2File('db_v20/mpa_v20_m200.pkl', 'r')) > > You can have more information about them at: > https://bitbucket.org/biobakery/metaphlan2#markdown-header-customizing-the-database OK, that page clarifies the method. Just a personal remark from the point of view of an outsider of bioinformatics: I'd regard the creation process of the mpa_v20_m200.pkl file a bit cumbersome. I'd personally prefer droping some text record somewhere and call a script processing this record rather than writing an own script. > In addition, some files were changed the names: > - metaphlan2_strainer.py -> strainphlan.py > - strainer_src -> strainphlan_src > - strainer_tutorial -> strainphlan_tutorial > > Some source files were updated as well. > Please let me know if you need other information. Just drop me a not once you might release a new version containing these changes. I think I'll try to release the current version as is since at least the origin of the files is clarified now. I'm not yet sure whether the size of the data is acceptable or might spoil some limit. Regarding this I'm wondering whether I create a source tarball including rather markers.fasta and create the bt2 files in the build process. Kind regards Andreas. -- http://fam-tille.de