Re: [Biohaskell] fasta-iteratee (iteratee-based fasta reading?)

Felipe Almeida Lessa Tue, 29 Nov 2011 16:40:45 -0800

On Tue, Nov 29, 2011 at 10:32 PM, Christian Höner zu Siederdissen
<[email protected]> wrote:
> how much interest is there for iteratee-based fasta reading? Has someone
> already written something?


I don't know.  While it would be nice, currently that's not something
that I need myself.

> Since iteratee- (or enumerator-based parsing in general) is strict in
> its output, there are some considerations regarding large files. On the
> other hand, sometime in early 2012 I'll probably provide a library to
> efficiently handle tasks on large sequence-based files.

What do you mean by "strict in its output"?  Do you mean that each
sequence of the FASTA file would need to be held in memory?

I guess there are two different FASTA readers possible, depending on
if the stream is based on (just examples)

    data FastaSeq = FastaSeq SeqLabel SeqData

or

    data FastaItem = FastaLabel SeqLabel | FastaData SeqData

Using FastaSeq you get a simple-to-use interface that needs to hold
each sequence in memory.  Using FastaItem you get something like a SAX
parser where the stream may be consumed in constant memory usage
(something like [FastaLabel ..., FastaData ..., FastaData ...,
FastaData ...] where each data chunk is of a limited size), but where
it's a little bit more difficult to write programs.

Assuming that we wrote some FASTA parser using enumeratees, I guess
FastaItem is the way to go, since it's possible to have an enumeratee
that converts FastaItems into FastaSeqs.

Cheers,

-- 
Felipe.
_______________________________________________
Biohaskell mailing list
[email protected]
http://malde.org/cgi-bin/mailman/listinfo/biohaskell

Re: [Biohaskell] fasta-iteratee (iteratee-based fasta reading?)

Reply via email to