Re: parsing fastq files with D

rikki cattermole via Digitalmars-d-learn Thu, 24 Mar 2016 01:36:58 -0700

On 24/03/16 9:24 PM, eastanon wrote:

On Thursday, 24 March 2016 at 06:34:51 UTC, rikki cattermole wrote:

As a little fun thing to do I implemented it for you.


It won't allocate. Making this perfect for you.
With a bit of work you could make Result have buffers for result
instead of using the input array allow for the source to be an input
range itself.

I made this up on dpaste and single quotes were not playing nicely
there. So you'll see "\r"[0] as a workaround.


Thank you very much. I think you have exposed me to  a number of new
concepts that I will go through and annotate the code with. I read all
input from file as follows.

string text = cast(string)std.file.read(inputfile);
foreach(record;FastQRecord.parse(text)){
    writeln(record);
}

</naivequestion>Does this mean that text is allocated to memory? and is
there a better way to read and process the inputfile? </naivequestion>

Yup, any string usage gets allocated in memory. Since the way I designedparse there, it won't allocate during returning of a record. Just don'tgo around modifying that memory or keeping copies (not critical but Iwouldn't).

There are better ways to read the input file. For example byChunk wouldread it in smaller groups of say 4096 chars. But you would need tooverload that with support for e.g. lines parsing.


The way I'd go is memory mapped files as an input range.

After all, let the OS help you out with deciding when to put the fileand parts of it into memory.

After all, if you have 1gb files you want to parse. Could you just aseasily have 2tb worth? Probably since it is DNA after all.

You kinda don't want that all in memory at once if you know what I mean.

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Re: parsing fastq files with D

Reply via email to