I am trying to understand the code in the Alignment module. Since AAlign already has a global_align function, how is that different from the "needlemanWunsch :: M -> G -> S -> S -> (Score,[(Int,Int)])" that you mentioned? Or I should look for ways to improve that? Thanks!
Cheers, Kenneth On Thu, Oct 20, 2011 at 03:25, Christian Höner zu Siederdissen < [email protected]> wrote: > Hi, > > 1) > > http://hackage.haskell.org/packages/archive/bio/0.5.0.1/doc/html/Bio-Alignment-AAlign.html > > 2) > Blast and other programs do sequence alignment as well and spit out > their results in such formats. You don't need to look into these -- but > you'll see the algorithms in later bioinformatics courses. > > 3) > Yes, Needleman-Wunsch is a prime candidate for a first algorithm. It > should basically look like this (in Haskell notation): > > needlemanWunsch :: M -> G -> S -> S -> (Score,[(Int,Int)]) > > You need a scoring matrix M, gap costs G and two input sequences S. You > return the alignment score Score and a list of cells [(Int,Int)] that > produce the score. > > Gruss, > Christian > > * Kenneth Lui <[email protected]> [20.10.2011 12:08]: > > Global alignment does sound like an interesting subject! > > I have a few questions, > > 1) does biohaskell already have any globabl alignment code? If so, in > > which package? > > 2) From the biohaskell wiki, I found the following information about > > alignment: "Supported alignment formats: ACE, BlastXML, PSL (Blat), > > Bowtie, Soap, GFF3, BED." and "The various alignment formats (Blast, > PSL, > > ACE) should be standardized and better integrated." How are they > related > > to the algorithm that I should implement. e.g. what are the > input/output > > format that it should support? > > 3) From the wikepedia, it mentions the Needleman*Wunsch algorithm, is > it > > what I am supposed to implement? > > For algebraic dynamic programming, a professor of mine has mentioned > it as > > well, hopefully I can dive further in it if possible. > > Thanks Christian for all the info! > > Cheers, > > Kenneth > > > > On Thu, Oct 20, 2011 at 02:32, Christian Ho:ner zu Siederdissen > > <[email protected]> wrote: > > > > * Ketil Malde <[email protected]> [20.10.2011 09:54]: > > > > > > One imortant piece of functionality that could be ripped from > biolib > > and > > > made into a separate library, is the BLAST output parser^1. This > > could > > > also do with some cleanup, and would make a nice, standalone > project. > > > It's also fairly open-ended. If you're more interested in > algorithms, > > > there's some stuff for sequence alignments that I was never quite > > > satisfied with. > > > > > > ^1 Christian, didn't you do something on this? > > > > Yeah, I completely forgot about that. The bits and pieces I have, > once I > > find them again ;-), are iteratee-code, however. Not the best thing > to > > start Haskell with. On the other hand, once you understand that > stuff > > you know a lot of high-level Haskell in addition to how to make > Haskell > > fast... > > > > == > > > > As a student project, the second idea on sequence alignments seems > to be > > more fun, though. And it would be useful in on its own. The sequence > > alignment stuff can be done in a month as the algorithms are not > that > > complicated and you mostly just need to know Haskell arrays. > > > > These are possible tasks: > > > > - global alignment > > - (backtracking) > > - local alignment > > - high-performance code > > - unboxed arrays > > - vector-based fusion operations > > > > If put backtracking in brackets as there are two interesting ways on > > how to do alignments: have a forward pass calculating scores and > find > > out via backtracking what alignments produce this score. Or use > s.th. > > like "algebraic dynamic programming" (Giegerich et al) to do it all > in > > one pass. > > > > The order of tasks above should allow to stop at any point and have > > something to show, that would be useful later on. Basically, if you > > write that and stop after vector-fusion operations you come close to > > C-code in terms of performance. > > > > Anyway, start with global alignment, you can basically read a book > > chapter on that on day 1, code it using arrays in 1-2 days > thereafter > > (depending on what you know about Haskell) > > > > Gruss, > > Christian > > > > -- > > Kenneth Lui > > > _______________________________________________ > > Biohaskell mailing list > > [email protected] > > http://malde.org/cgi-bin/mailman/listinfo/biohaskell > > -- *Kenneth Lui* Computer Science Major (Class of 2012) University of British Columbia, Vancouver, Canada Email: [email protected] Phone number: 778 - 319 - 4321 (Canada) Website: http://ca.linkedin.com/in/hkkenneth
_______________________________________________ Biohaskell mailing list [email protected] http://malde.org/cgi-bin/mailman/listinfo/biohaskell
