Re: [Biohaskell] Hoping to contribute to BioHaskell for a student project

Kenneth Lui Fri, 21 Oct 2011 01:39:29 -0700

I am trying to understand the code in the Alignment module. Since AAlign
already has a global_align function, how is that different from the
"needlemanWunsch :: M -> G -> S -> S -> (Score,[(Int,Int)])" that you
mentioned? Or I should look for ways to improve that? Thanks!


Cheers,
Kenneth

On Thu, Oct 20, 2011 at 03:25, Christian Höner zu Siederdissen <
[email protected]> wrote:

> Hi,
>
> 1)
>
> http://hackage.haskell.org/packages/archive/bio/0.5.0.1/doc/html/Bio-Alignment-AAlign.html
>
> 2)
> Blast and other programs do sequence alignment as well and spit out
> their results in such formats. You don't need to look into these -- but
> you'll see the algorithms in later bioinformatics courses.
>
> 3)
> Yes, Needleman-Wunsch is a prime candidate for a first algorithm. It
> should basically look like this (in Haskell notation):
>
> needlemanWunsch :: M -> G -> S -> S -> (Score,[(Int,Int)])
>
> You need a scoring matrix M, gap costs G and two input sequences S. You
> return the alignment score Score and a list of cells [(Int,Int)] that
> produce the score.
>
> Gruss,
> Christian
>
> * Kenneth Lui <[email protected]> [20.10.2011 12:08]:
> >    Global alignment does sound like an interesting subject!
> >    I have a few questions,
> >    1) does biohaskell already have any globabl alignment code? If so, in
> >    which package?
> >    2) From the biohaskell wiki, I found the following information about
> >    alignment: "Supported alignment formats: ACE, BlastXML, PSL (Blat),
> >    Bowtie, Soap, GFF3, BED." and "The various alignment formats (Blast,
> PSL,
> >    ACE) should be standardized and better integrated." How are they
> related
> >    to the algorithm that I should implement. e.g. what are the
> input/output
> >    format that it should support?
> >    3) From the wikepedia, it mentions the Needleman*Wunsch algorithm, is
> it
> >    what I am supposed to implement?
> >    For algebraic dynamic programming, a professor of mine has mentioned
> it as
> >    well, hopefully I can dive further in it if possible.
> >    Thanks Christian for all the info!
> >    Cheers,
> >    Kenneth
> >
> >    On Thu, Oct 20, 2011 at 02:32, Christian Ho:ner zu Siederdissen
> >    <[email protected]> wrote:
> >
> >      * Ketil Malde <[email protected]> [20.10.2011 09:54]:
> >      >
> >      > One imortant piece of functionality that could be ripped from
> biolib
> >      and
> >      > made into a separate library, is the BLAST output parser^1.  This
> >      could
> >      > also do with some cleanup, and would make a nice, standalone
> project.
> >      > It's also fairly open-ended.  If you're more interested in
> algorithms,
> >      > there's some stuff for sequence alignments that I was never quite
> >      > satisfied with.
> >      >
> >      > ^1 Christian, didn't you do something on this?
> >
> >      Yeah, I completely forgot about that. The bits and pieces I have,
> once I
> >      find them again ;-), are iteratee-code, however. Not the best thing
> to
> >      start Haskell with. On the other hand, once you understand that
> stuff
> >      you know a lot of high-level Haskell in addition to how to make
> Haskell
> >      fast...
> >
> >      ==
> >
> >      As a student project, the second idea on sequence alignments seems
> to be
> >      more fun, though. And it would be useful in on its own. The sequence
> >      alignment stuff can be done in a month as the algorithms are not
> that
> >      complicated and you mostly just need to know Haskell arrays.
> >
> >      These are possible tasks:
> >
> >      - global alignment
> >      - (backtracking)
> >      - local alignment
> >      - high-performance code
> >       - unboxed arrays
> >       - vector-based fusion operations
> >
> >      If put backtracking in brackets as there are two interesting ways on
> >      how to do alignments: have a forward pass calculating scores and
> find
> >      out via backtracking what alignments produce this score. Or use
> s.th.
> >      like "algebraic dynamic programming" (Giegerich et al) to do it all
> in
> >      one pass.
> >
> >      The order of tasks above should allow to stop at any point and have
> >      something to show, that would be useful later on. Basically, if you
> >      write that and stop after vector-fusion operations you come close to
> >      C-code in terms of performance.
> >
> >      Anyway, start with global alignment, you can basically read a book
> >      chapter on that on day 1, code it using arrays in 1-2 days
> thereafter
> >      (depending on what you know about Haskell)
> >
> >      Gruss,
> >      Christian
> >
> >    --
> >    Kenneth Lui
>
> > _______________________________________________
> > Biohaskell mailing list
> > [email protected]
> > http://malde.org/cgi-bin/mailman/listinfo/biohaskell
>
>


-- 
*Kenneth Lui*
Computer Science Major (Class of 2012)
University of British Columbia, Vancouver, Canada
Email: [email protected]
Phone number: 778 - 319 - 4321 (Canada)
Website: http://ca.linkedin.com/in/hkkenneth

_______________________________________________
Biohaskell mailing list
[email protected]
http://malde.org/cgi-bin/mailman/listinfo/biohaskell

Re: [Biohaskell] Hoping to contribute to BioHaskell for a student project

Reply via email to