Re: How to read fastly files ( I/O operation)

monarch_dodra Wed, 13 Feb 2013 09:40:27 -0800

On Tuesday, 12 February 2013 at 22:06:48 UTC, monarch_dodra wrote:

On Tuesday, 12 February 2013 at 21:41:14 UTC, bioinfornaticswrote:
Some time fastq are comressed to gz bz2 or xz as that is oftena
huge file.
Maybe we need keep in mind this early in developement and use
std.zlib
While working on making the parser multi-threaded compatible, Iwas able to seperate the part that feeds data, and the partthat parses data.
Long story short, the parser operates on an input range ofubyte[]: It is not responsible any more for acquisition of data.
The range can be a simple (wrapped) File, a byChunk, anasynchroneus file reader, or a zip decompresser, or just stdinI guess. Range can be transient.
However, now that you mention it, I'll make sure it iscorrectly supported.
I'll *try* to show you what I have so far tomorow (in about18h).

Yeah... I played around too much, and the file is dirtier thanever.

The good news is that I was able to test out what I was tellingyou about: accepting any range is ok:

I used your ZFile range to plug it into my parser: I can nowparse zipped files directly.

The good news is that now, I'm not bottle necked by IO anymore!The bad news is that I'm now bottle necked by CPU decompressing.But since I'm using dmd, you may get better results with LDC orGDC.

In any case, I am now parsing the 6Gig packed into 1.5Gig inabout 53 seconds (down from 61). I also tried doing adual-threaded approach (1 thread to unzip, 1 thread to parse),but again, the actual *parse* phase is so ridiculously fast, thatit changes *nothing* to the final result.

Long story short: 99% of the time is spent acquiring data. Thelast 1% is just copying it into local buffers.

The last good news though is that CPU bottleneck is always betterthan IO bottleneck. If you have multiple cores, you should beable to run multiple *instances* (not threads), and be able toprocess several files at once, multiplying your throughput.

Re: How to read fastly files ( I/O operation)

Reply via email to