On Tue, Nov 29, 2011 at 10:32 PM, Christian Höner zu Siederdissen
<[email protected]> wrote:
> how much interest is there for iteratee-based fasta reading? Has someone
> already written something?
I don't know. While it would be nice, currently that's not something
that I need myself.
> Since iteratee- (or enumerator-based parsing in general) is strict in
> its output, there are some considerations regarding large files. On the
> other hand, sometime in early 2012 I'll probably provide a library to
> efficiently handle tasks on large sequence-based files.
What do you mean by "strict in its output"? Do you mean that each
sequence of the FASTA file would need to be held in memory?
I guess there are two different FASTA readers possible, depending on
if the stream is based on (just examples)
data FastaSeq = FastaSeq SeqLabel SeqData
or
data FastaItem = FastaLabel SeqLabel | FastaData SeqData
Using FastaSeq you get a simple-to-use interface that needs to hold
each sequence in memory. Using FastaItem you get something like a SAX
parser where the stream may be consumed in constant memory usage
(something like [FastaLabel ..., FastaData ..., FastaData ...,
FastaData ...] where each data chunk is of a limited size), but where
it's a little bit more difficult to write programs.
Assuming that we wrote some FASTA parser using enumeratees, I guess
FastaItem is the way to go, since it's possible to have an enumeratee
that converts FastaItems into FastaSeqs.
Cheers,
--
Felipe.
_______________________________________________
Biohaskell mailing list
[email protected]
http://malde.org/cgi-bin/mailman/listinfo/biohaskell