On 24/03/16 9:24 PM, eastanon wrote:
On Thursday, 24 March 2016 at 06:34:51 UTC, rikki cattermole wrote:
As a little fun thing to do I implemented it for you.

It won't allocate. Making this perfect for you.
With a bit of work you could make Result have buffers for result
instead of using the input array allow for the source to be an input
range itself.

I made this up on dpaste and single quotes were not playing nicely
there. So you'll see "\r"[0] as a workaround.

Thank you very much. I think you have exposed me to  a number of new
concepts that I will go through and annotate the code with. I read all
input from file as follows.

string text = cast(string)std.file.read(inputfile);
foreach(record;FastQRecord.parse(text)){
    writeln(record);
}

</naivequestion>Does this mean that text is allocated to memory? and is
there a better way to read and process the inputfile? </naivequestion>

Yup, any string usage gets allocated in memory. Since the way I designed parse there, it won't allocate during returning of a record. Just don't go around modifying that memory or keeping copies (not critical but I wouldn't).

There are better ways to read the input file. For example byChunk would read it in smaller groups of say 4096 chars. But you would need to overload that with support for e.g. lines parsing.

The way I'd go is memory mapped files as an input range.
After all, let the OS help you out with deciding when to put the file and parts of it into memory.

After all, if you have 1gb files you want to parse. Could you just as easily have 2tb worth? Probably since it is DNA after all.
You kinda don't want that all in memory at once if you know what I mean.

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Reply via email to