fasta parser with iopipe?

biocyberman via Digitalmars-d-learn Wed, 23 Aug 2017 02:56:24 -0700

I lost my momentum to learn D and want to gain it up again.Therefore I need some help with this seemingly simple task:


# Fasta sequence

\>Entry1_ID header field1|header field2|...
CAGATATCTTTGATGTCCTGATTGGAAGGACCGTTGGCCCCCCACCCTTAGGCAG
TGTATACTCTTCCATAAACGAGCTATTAGTTATGAGGTCCGTAGATTGAAAAGGG
TGACGGAATTCGGCCGAACGGGAAAGACGGACATCTAGGTATCCTGAGCACGGTT
GCGCGTCCGTATCAAGCTCCTCTTTATAGGCCCCG
\>Entry2_ID header field1|header field4|...
GTTACTGTTGGTCGTAGAGCCCAGAACGGGTTGGGCAGATGTACGACAATATCGCT
TAGTCACCCTTGGGCCACGGTCCGCTACCTTACAGGAATTGAGA

\>Entry3_ID header field1|header field2|...
GGCAGTACGATCGCACGCCCCACGTGAACGATTGGTAAACCCTGTGGCCTGTGAGC
GACAAAAGCTTTAATGGGAAATACGCGCCCATAACTTGGTGCGA


# Some characteristics:

- Entry_ID is >[[:alphanumeric:]]. Where '>' marks the entrystart. In this post I have to put an escape character (\) to makethe '>' visible.- Headers may contain annotation information separated by somedelimiter (i.e. | in this case).- Entry ID and header is a single line, which does not containnewline characters.

- Sequence under the header line is [ATCGN\n]* (Perl regex).
- A fasta file can be plain-text or gzip compressed.


# Goals:

Write a parser that uses Dlang range with iopipe library forperformance and ease of use. A big fasta file can be dozens ofgigabytes.


# Questions:

1. How do I model a fasta entry with a struct or class?

2. How to I implement a range of fasta entries with iopipe. Arange in this case can be a forward range, but preferably arandom access range.3. I want to do with range to explore the power and elegance ofranges. But if performance is a big concern, what can I doalternatively?

fasta parser with iopipe?

Reply via email to