Hi, 1) http://hackage.haskell.org/packages/archive/bio/0.5.0.1/doc/html/Bio-Alignment-AAlign.html
2) Blast and other programs do sequence alignment as well and spit out their results in such formats. You don't need to look into these -- but you'll see the algorithms in later bioinformatics courses. 3) Yes, Needleman-Wunsch is a prime candidate for a first algorithm. It should basically look like this (in Haskell notation): needlemanWunsch :: M -> G -> S -> S -> (Score,[(Int,Int)]) You need a scoring matrix M, gap costs G and two input sequences S. You return the alignment score Score and a list of cells [(Int,Int)] that produce the score. Gruss, Christian * Kenneth Lui <[email protected]> [20.10.2011 12:08]: > Global alignment does sound like an interesting subject! > I have a few questions, > 1) does biohaskell already have any globabl alignment code? If so, in > which package? > 2) From the biohaskell wiki, I found the following information about > alignment: "Supported alignment formats: ACE, BlastXML, PSL (Blat), > Bowtie, Soap, GFF3, BED." and "The various alignment formats (Blast, PSL, > ACE) should be standardized and better integrated." How are they related > to the algorithm that I should implement. e.g. what are the input/output > format that it should support? > 3) From the wikepedia, it mentions the Needleman*Wunsch algorithm, is it > what I am supposed to implement? > For algebraic dynamic programming, a professor of mine has mentioned it as > well, hopefully I can dive further in it if possible. > Thanks Christian for all the info! > Cheers, > Kenneth > > On Thu, Oct 20, 2011 at 02:32, Christian Ho:ner zu Siederdissen > <[email protected]> wrote: > > * Ketil Malde <[email protected]> [20.10.2011 09:54]: > > > > One imortant piece of functionality that could be ripped from biolib > and > > made into a separate library, is the BLAST output parser^1. This > could > > also do with some cleanup, and would make a nice, standalone project. > > It's also fairly open-ended. If you're more interested in algorithms, > > there's some stuff for sequence alignments that I was never quite > > satisfied with. > > > > ^1 Christian, didn't you do something on this? > > Yeah, I completely forgot about that. The bits and pieces I have, once I > find them again ;-), are iteratee-code, however. Not the best thing to > start Haskell with. On the other hand, once you understand that stuff > you know a lot of high-level Haskell in addition to how to make Haskell > fast... > > == > > As a student project, the second idea on sequence alignments seems to be > more fun, though. And it would be useful in on its own. The sequence > alignment stuff can be done in a month as the algorithms are not that > complicated and you mostly just need to know Haskell arrays. > > These are possible tasks: > > - global alignment > - (backtracking) > - local alignment > - high-performance code > - unboxed arrays > - vector-based fusion operations > > If put backtracking in brackets as there are two interesting ways on > how to do alignments: have a forward pass calculating scores and find > out via backtracking what alignments produce this score. Or use s.th. > like "algebraic dynamic programming" (Giegerich et al) to do it all in > one pass. > > The order of tasks above should allow to stop at any point and have > something to show, that would be useful later on. Basically, if you > write that and stop after vector-fusion operations you come close to > C-code in terms of performance. > > Anyway, start with global alignment, you can basically read a book > chapter on that on day 1, code it using arrays in 1-2 days thereafter > (depending on what you know about Haskell) > > Gruss, > Christian > > -- > Kenneth Lui > _______________________________________________ > Biohaskell mailing list > [email protected] > http://malde.org/cgi-bin/mailman/listinfo/biohaskell
pgpaHtHdLqilt.pgp
Description: PGP signature
_______________________________________________ Biohaskell mailing list [email protected] http://malde.org/cgi-bin/mailman/listinfo/biohaskell
