Re: [Biohaskell] Hoping to contribute to BioHaskell for a student project

Christian Höner zu Siederdissen Thu, 20 Oct 2011 03:25:50 -0700

Hi,

1)
http://hackage.haskell.org/packages/archive/bio/0.5.0.1/doc/html/Bio-Alignment-AAlign.html


2)
Blast and other programs do sequence alignment as well and spit out
their results in such formats. You don't need to look into these -- but
you'll see the algorithms in later bioinformatics courses.

3)
Yes, Needleman-Wunsch is a prime candidate for a first algorithm. It
should basically look like this (in Haskell notation):

needlemanWunsch :: M -> G -> S -> S -> (Score,[(Int,Int)])

You need a scoring matrix M, gap costs G and two input sequences S. You
return the alignment score Score and a list of cells [(Int,Int)] that
produce the score.

Gruss,
Christian

* Kenneth Lui <[email protected]> [20.10.2011 12:08]:
>    Global alignment does sound like an interesting subject!
>    I have a few questions,
>    1) does biohaskell already have any globabl alignment code? If so, in
>    which package?
>    2) From the biohaskell wiki, I found the following information about
>    alignment: "Supported alignment formats: ACE, BlastXML, PSL (Blat),
>    Bowtie, Soap, GFF3, BED." and "The various alignment formats (Blast, PSL,
>    ACE) should be standardized and better integrated." How are they related
>    to the algorithm that I should implement. e.g. what are the input/output
>    format that it should support?
>    3) From the wikepedia, it mentions the Needleman*Wunsch algorithm, is it
>    what I am supposed to implement?
>    For algebraic dynamic programming, a professor of mine has mentioned it as
>    well, hopefully I can dive further in it if possible.
>    Thanks Christian for all the info!
>    Cheers,
>    Kenneth
> 
>    On Thu, Oct 20, 2011 at 02:32, Christian Ho:ner zu Siederdissen
>    <[email protected]> wrote:
> 
>      * Ketil Malde <[email protected]> [20.10.2011 09:54]:
>      >
>      > One imortant piece of functionality that could be ripped from biolib
>      and
>      > made into a separate library, is the BLAST output parser^1.  This
>      could
>      > also do with some cleanup, and would make a nice, standalone project.
>      > It's also fairly open-ended.  If you're more interested in algorithms,
>      > there's some stuff for sequence alignments that I was never quite
>      > satisfied with.
>      >
>      > ^1 Christian, didn't you do something on this?
> 
>      Yeah, I completely forgot about that. The bits and pieces I have, once I
>      find them again ;-), are iteratee-code, however. Not the best thing to
>      start Haskell with. On the other hand, once you understand that stuff
>      you know a lot of high-level Haskell in addition to how to make Haskell
>      fast...
> 
>      ==
> 
>      As a student project, the second idea on sequence alignments seems to be
>      more fun, though. And it would be useful in on its own. The sequence
>      alignment stuff can be done in a month as the algorithms are not that
>      complicated and you mostly just need to know Haskell arrays.
> 
>      These are possible tasks:
> 
>      - global alignment
>      - (backtracking)
>      - local alignment
>      - high-performance code
>       - unboxed arrays
>       - vector-based fusion operations
> 
>      If put backtracking in brackets as there are two interesting ways on
>      how to do alignments: have a forward pass calculating scores and find
>      out via backtracking what alignments produce this score. Or use s.th.
>      like "algebraic dynamic programming" (Giegerich et al) to do it all in
>      one pass.
> 
>      The order of tasks above should allow to stop at any point and have
>      something to show, that would be useful later on. Basically, if you
>      write that and stop after vector-fusion operations you come close to
>      C-code in terms of performance.
> 
>      Anyway, start with global alignment, you can basically read a book
>      chapter on that on day 1, code it using arrays in 1-2 days thereafter
>      (depending on what you know about Haskell)
> 
>      Gruss,
>      Christian
> 
>    --
>    Kenneth Lui

> _______________________________________________
> Biohaskell mailing list
> [email protected]
> http://malde.org/cgi-bin/mailman/listinfo/biohaskell

pgpaHtHdLqilt.pgp
Description: PGP signature

_______________________________________________
Biohaskell mailing list
[email protected]
http://malde.org/cgi-bin/mailman/listinfo/biohaskell

Re: [Biohaskell] Hoping to contribute to BioHaskell for a student project

Reply via email to