On Wednesday, 3 January 2018 at 16:09:19 UTC, Steven Schveighoffer wrote:
On 1/3/18 9:45 AM, Andrew wrote:
Hi,

I have a very large gziped text file (all ASCII characters and ~500GB) that I want to stream and process line-by-line, and I thought the iopipe library would be perfect for this, but I can't seem to get it to work. So far, this is the closest I have to getting it to work:

import iopipe.textpipe;
import iopipe.zip;
import iopipe.bufpipe;
import iopipe.stream;

void main()
{

  auto fileToRead = openDev("file.gz").bufd.unzip(CompressionFormat.gzip);

   foreach (line; fileToRead.assumeText.byLineRange!false)
   {
      \\ do stuff
   }
}

but this only processes the first ~200 odd lines (I guess the initial read into the buffer). Can anyone help me out?

Do you have a sample file I can play with? Your iopipe chain looks correct, so I'm not sure why it wouldn't work.

-Steve

A sample file (about 250MB) can be found here:

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz

It should have 1,103,800 lines, but the following code only reports 256:

import iopipe.textpipe;
import iopipe.zip;
import iopipe.bufpipe;
import iopipe.stream;
import std.stdio;

void main()
{

auto fileToRead = openDev("ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz").bufd.unzip(CompressionFormat.gzip);

   auto counter = 0;
   foreach (line; fileToRead.assumeText.byLineRange!false)
   {
      counter++;
   }
   writeln(counter);
}

Thanks for looking into this.

Andrew

Reply via email to