I lost my momentum to learn D and want to gain it up again. Therefore I need some help with this seemingly simple task:

# Fasta sequence


\>Entry1_ID header field1|header field2|...
CAGATATCTTTGATGTCCTGATTGGAAGGACCGTTGGCCCCCCACCCTTAGGCAG
TGTATACTCTTCCATAAACGAGCTATTAGTTATGAGGTCCGTAGATTGAAAAGGG
TGACGGAATTCGGCCGAACGGGAAAGACGGACATCTAGGTATCCTGAGCACGGTT
GCGCGTCCGTATCAAGCTCCTCTTTATAGGCCCCG
\>Entry2_ID header field1|header field4|...
GTTACTGTTGGTCGTAGAGCCCAGAACGGGTTGGGCAGATGTACGACAATATCGCT
TAGTCACCCTTGGGCCACGGTCCGCTACCTTACAGGAATTGAGA

\>Entry3_ID header field1|header field2|...
GGCAGTACGATCGCACGCCCCACGTGAACGATTGGTAAACCCTGTGGCCTGTGAGC
GACAAAAGCTTTAATGGGAAATACGCGCCCATAACTTGGTGCGA

# Some characteristics:

- Entry_ID is >[[:alphanumeric:]]. Where '>' marks the entry start. In this post I have to put an escape character (\) to make the '>' visible. - Headers may contain annotation information separated by some delimiter (i.e. | in this case). - Entry ID and header is a single line, which does not contain newline characters.
- Sequence under the header line is [ATCGN\n]* (Perl regex).
- A fasta file can be plain-text or gzip compressed.


# Goals:
Write a parser that uses Dlang range with iopipe library for performance and ease of use. A big fasta file can be dozens of gigabytes.

# Questions:

1. How do I model a fasta entry with a struct or class?
2. How to I implement a range of fasta entries with iopipe. A range in this case can be a forward range, but preferably a random access range. 3. I want to do with range to explore the power and elegance of ranges. But if performance is a big concern, what can I do alternatively?


Reply via email to