Re: Processing a gzipped csv-file by line-by-line

Steven Schveighoffer via Digitalmars-d-learn Fri, 12 May 2017 06:21:01 -0700

On 5/11/17 8:18 PM, H. S. Teoh via Digitalmars-d-learn wrote:

On Wed, May 10, 2017 at 11:40:08PM +0000, Jesse Phillips via 
Digitalmars-d-learn wrote:

If you can get the zip to decompress into a range of dchar then
std.csv will work with it. It is by far not the fastest, but much
speed is lost since it supports input ranges and doesn't specialize on
any other range type.


I actually spent some time today to look into whether fastcsv can
possibly be made to work with general input ranges as long as they
support slicing... and immediately ran into the infamous autodecoding
issue: strings are not random-access ranges because of autodecoding, so
it would require either extensive code surgery to make it work, or ugly
hacks to bypass autodecoding.  I'm quite tempted to attempt the latter,
in fact, but not now since it's getting busier at work and I don't have
that much free time to spend on a major refactoring of fastcsv.

Yeah, iopipe treats char[] as a random-access sliceable range :)Autodecoding gets annoying if you want to do anything fancy (likechain(somestr, someotherstr))

Alternatively, I could possibly hack together a version of fastcsv that
took a range of const(char)[] as input (rather than a single string), so
that, in theory, it could handle arbitrarily large input files as long
as the caller can provide a range of data blocks, e.g., File.byChunk, or
in this particular case, a range of decompressed data blocks from
whatever decompressor is used to extract the data.  As long as you
consume the individual rows without storing references to them
indefinitely (don't try to make an array of the entire dataset),
fastcsv's optimizations should still work, since unreferenced blocks
will eventually get cleaned up by the GC when memory runs low.

I'm interested in getting a fast CSV parser built on top of iopipe. Imay fork your code and see if I can get it to work. Since you alreadywork on arrays, it should be quite simple, since arrays are also iopipesby default.


-Steve

Re: Processing a gzipped csv-file by line-by-line

Reply via email to