Hi Andreas,

I meant the latest version of the repository fit with the tutorial on the
repository. If you used the older version (old names), I am afraid users
will have some problems when following the tutorial.
Regarding the separation code and data issue, I will discuss with Nicola
next Monday and let you know.

Thanks,
Tin

On Fri, Aug 5, 2016 at 10:21 PM Andreas Tille <andr...@an3as.eu> wrote:

> Hi Tin,
>
> I need to admit that I can not parse the information you gave in your mail.
>
> It is also not really connected to my next mail (which is archived here
>    https://lists.debian.org/debian-med/2016/08/msg00040.html ) about the
> separation of code and data.
>
> Kind regards
>
>       Andreas.
>
> On Thu, Aug 04, 2016 at 11:48:08AM +0000, Duy Tin Truong wrote:
> > Hi Andreas,
> >
> > If you can use the latest version with the name changes as I mentioned,
> it
> > would fit better with the updated tutorial on the metaphlan2 repository.
> >
> > Thanks,
> > Tin
> >
> > On Thu, Aug 4, 2016 at 1:28 PM Nicola Segata <nicola.seg...@unitn.it>
> wrote:
> >
> > > Hi Andreas,
> > >  yes, it is likely that the code will be frequently updated, but the
> big
> > > database file will change only rarely (for sure no more frequently than
> > > once a year).
> > > thanks
> > > Nicola
> > >
> > > On Thu, Aug 4, 2016 at 12:46 PM Andreas Tille <andr...@an3as.eu>
> wrote:
> > >
> > >> Hi again,
> > >>
> > >> On Thu, Aug 04, 2016 at 08:10:29AM +0000, Nicola Segata wrote:
> > >> > Makes sense to me!
> > >>
> > >>    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833388#15
> > >>
> > >> If you read the discussion it seems that my suggestion to ship the
> fasta
> > >> file inside the Debian package and let the postinst do the
> > >> transformation step found some agreement - provided that there are no
> > >> frequent changes in the package and several uploads per month will
> > >> happen.
> > >>
> > >> I'm now wondering what your estimated change rate for the metaphlan2
> > >> data files might be.  Do these change frequently?  Is there any chance
> > >> that the code changes frequently but the data files stay unchanged?
> > >>
> > >> Kind regards
> > >>
> > >>       Andreas.
> > >>
> > >> > On Thu, Aug 4, 2016 at 8:18 AM Andreas Tille <andr...@an3as.eu>
> wrote:
> > >> >
> > >> > > Hi Nicola,
> > >> > >
> > >> > > On Wed, Aug 03, 2016 at 08:51:33PM +0000, Nicola Segata wrote:
> > >> > > > Great, thanks Andreas. We provide the "*.bt2" files so that the
> > >> user can
> > >> > > > run BowTie2 internally to MetaPhlAn directly without first
> building
> > >> the
> > >> > > > indexes (it will take quite a bit of time).
> > >> > >
> > >> > > Fully agreed here.
> > >> > >
> > >> > > > Also, the indexes are smaller
> > >> > > > in size than the sequence file...
> > >> > >
> > >> > > Hmmm, all *.bt2 files sum up to 1,124,449kB while the fasta file
> has
> > >> > > only 753081kB.  Considering the better compression performance of
> pure
> > >> > > text files a compressed archive containing the fasta is
> drastically
> > >> > > smaller than one with the *.bt2 files.  Yesterday I tried to
> start a
> > >> > > discussion how to deal with the size of the data inside Debian[1]
> (no
> > >> > > answer so far) and my experiment to create a source tarball just
> > >> > > containing the fasta resulted in a 270MB *xz* compressed file
> (well xz
> > >> > > is better than gz but lets say the compressed tarball with the
> fasta
> > >> is
> > >> > > about 30% of size of your current download of 1.017MB.
> > >> > >
> > >> > > The situation for Debian is different than from your users:  A
> user
> > >> who
> > >> > > downloads from your website intends to run metaphlan2.  Amongst
> the
> > >> > > millions of Debian users only very few are interested in
> metaphlan2
> > >> and
> > >> > > we need to outweight how much resources we could spent.  Its not
> that
> > >> > > only Debian provides resources.  There is a large mirroring
> network
> > >> that
> > >> > > spents lots of bandwidth and disk space for a very small usage.
> So in
> > >> > > this case it makes sense to put the effort on the users side to
> > >> > > regenerate the indexes (or even download the data separately via a
> > >> > > script we could provide inside the package).  So I could imagine
> to
> > >> > > package only the metaphlan2 code and provide a script that
> downloads
> > >> the
> > >> > > data and puts them into the expected place.
> > >> > >
> > >> > > Kind regards
> > >> > >
> > >> > >          Andreas.
> > >> > >
> > >> > > [1]
> > >> > >
> > >>
> https://lists.alioth.debian.org/pipermail/debian-med-packaging/2016-August/044984.html
> > >> > >
> > >> > > > cheers
> > >> > > > Nicola
> > >> > > >
> > >> > > > On Wed, Aug 3, 2016 at 6:08 PM Andreas Tille <andr...@an3as.eu>
> > >> wrote:
> > >> > > >
> > >> > > > > Hi Tin,
> > >> > > > >
> > >> > > > > On Wed, Aug 03, 2016 at 02:01:01PM +0000, Duy Tin Truong
> wrote:
> > >> > > > > > > - Tin can also provide more info about the binary data in
> > >> db_v20.
> > >> > > The
> > >> > > > > files
> > >> > > > > > > ending with "bt2" are created using a script in the
> Bowtie2
> > >> package
> > >> > > > > > > (bowtie2-build) using a sequence file Tin can provide (it
> can
> > >> also
> > >> > > be
> > >> > > > > > > recovered from the bt2 files with bowtie2-inspect if I
> > >> remember
> > >> > > well).
> > >> > > > > > As Nicola said, those files in db_v20 are created with
> > >> bowtie2-build
> > >> > > > > > using a sequence file and you can recover the sequence file
> by:
> > >> > > > > >
> > >> > > > > > bowtie2-inspect metaphlan2/db_v20/mpa_v20_m200 >
> > >> > > metaphlan2/markers.fasta
> > >> > > > > >
> > >> > > > > > If you want to rebuild them, the command is:
> > >> > > > > >
> > >> > > > > > bowtie2-build metaphlan2/markers.fasta
> > >> metaphlan2/db_v21/mpa_v21_m200
> > >> > > > >
> > >> > > > > I can confirm that I can reproduce the files byte identical
> from
> > >> > > > > markers.fasta.  Is there any reason to ship the binary form
> > >> instead of
> > >> > > > > the fasta text file?  Moreover, what is the source of the
> > >> > > markers.fasta?
> > >> > > > > Is there any related publication or so?
> > >> > > > >
> > >> > > > > > > For the mpa_v20_m200.pkl Tin can also provide the
> uncompressed
> > >> > > python
> > >> > > > > > > object (or he can provide a couple of lines of code to
> > >> uncompress
> > >> > > it?)
> > >> > > > > > It is python dictionary and can be read as:
> > >> > > > > >
> > >> > > > > > import cPickle as pickleimport bz2
> > >> > > > > > db = pickle.load(bz2.BZ2File('db_v20/mpa_v20_m200.pkl',
> 'r'))
> > >> > > > > >
> > >> > > > > > You can have more information about them at:
> > >> > > > > >
> > >> > > > >
> > >> > >
> > >>
> https://bitbucket.org/biobakery/metaphlan2#markdown-header-customizing-the-database
> > >> > > > >
> > >> > > > > OK, that page clarifies the method.  Just a personal remark
> from
> > >> the
> > >> > > > > point of view of an outsider of bioinformatics:  I'd regard
> the
> > >> > > creation
> > >> > > > > process of the mpa_v20_m200.pkl file a bit cumbersome.  I'd
> > >> personally
> > >> > > > > prefer droping some text record somewhere and call a script
> > >> processing
> > >> > > > > this record rather than writing an own script.
> > >> > > > >
> > >> > > > > > In addition, some files were changed the names:
> > >> > > > > >    - metaphlan2_strainer.py -> strainphlan.py
> > >> > > > > >    - strainer_src -> strainphlan_src
> > >> > > > > >    - strainer_tutorial -> strainphlan_tutorial
> > >> > > > > >
> > >> > > > > > Some source files were updated as well.
> > >> > > > > > Please let me know if you need other information.
> > >> > > > >
> > >> > > > > Just drop me a not once you might release a new version
> containing
> > >> > > these
> > >> > > > > changes.  I think I'll try to release the current version as
> is
> > >> since
> > >> > > at
> > >> > > > > least the origin of the files is clarified now.  I'm not yet
> sure
> > >> > > whether
> > >> > > > > the size of the data is acceptable or might spoil some limit.
> > >> > > Regarding
> > >> > > > > this I'm wondering whether I create a source tarball including
> > >> rather
> > >> > > > > markers.fasta and create the bt2 files in the build process.
> > >> > > > >
> > >> > > > > Kind regards
> > >> > > > >
> > >> > > > >        Andreas.
> > >> > > > >
> > >> > > > > --
> > >> > > > > http://fam-tille.de
> > >> > > > >
> > >> > >
> > >> > > --
> > >> > > http://fam-tille.de
> > >> > >
> > >>
> > >> --
> > >> http://fam-tille.de
> > >>
> > >
>
> --
> http://fam-tille.de
>

Reply via email to