Re: Read and write gzip files easily.

2017-05-11 Thread Nordlöw via Digitalmars-d

On Sunday, 3 May 2015 at 14:35:49 UTC, Per Nordlöw wrote:

Latest at

https://github.com/nordlow/justd/blob/master/zio.d


Should be 
https://github.com/nordlow/phobos-next/blob/master/src/zio.d


Re: Read and write gzip files easily.

2015-05-03 Thread Russel Winder via Digitalmars-d
And there is Zipios++

http://zipios.sourceforge.net/


On Sun, 2015-05-03 at 14:33 +, via Digitalmars-d wrote:
 On Thursday, 20 February 2014 at 10:35:50 UTC, Stephan Schiffels 
 wrote:
  Hi Kamil,
  I am glad someone has the exact same problem as I had. I 
  actually solved this, inspired by the python API you quoted 
  above. I wrote these classes:
  GzipInputRange, GzipByLine, and GzipOut.
  Here is how I can now use them:
  
 
 I've polished your module a bit at:
 
 https://github.com/nordlow/justd/blob/611ae3aac35a085af966e0c3b717deb
 0012f637b/zio.d
 
 Reflections:
 
 - Performance is terrible even with -release -noboundscheck 
 -unittest. About 20 times slower than zcat $F | wc -l. I'm 
 guessing
 
  _chunkRange.front.dup
 
 slows things down. I tried removing the .dup but then I get
 
  std.zlib.ZlibException@std/zlib.d(59): data error
 
 I don't believe we should have to do a copy of _chunkRange.front 
 but I can't figure out how to solve it. Anybody understands how 
 to fix this?
 
 - Shouldn't GzipOut.finish() call this.close()? Otherwise the 
 file remains unflushed.
 - And what about calling this.close() in GzipOut.~this()? Is that 
 needed to?
-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


signature.asc
Description: This is a digitally signed message part


Re: Read and write gzip files easily.

2015-05-03 Thread via Digitalmars-d
On Thursday, 20 February 2014 at 10:35:50 UTC, Stephan Schiffels 
wrote:

Hi Kamil,
I am glad someone has the exact same problem as I had. I 
actually solved this, inspired by the python API you quoted 
above. I wrote these classes:

GzipInputRange, GzipByLine, and GzipOut.
Here is how I can now use them:



I've polished your module a bit at:

https://github.com/nordlow/justd/blob/611ae3aac35a085af966e0c3b717deb0012f637b/zio.d

Reflections:

- Performance is terrible even with -release -noboundscheck 
-unittest. About 20 times slower than zcat $F | wc -l. I'm 
guessing


_chunkRange.front.dup

slows things down. I tried removing the .dup but then I get

std.zlib.ZlibException@std/zlib.d(59): data error

I don't believe we should have to do a copy of _chunkRange.front 
but I can't figure out how to solve it. Anybody understands how 
to fix this?


- Shouldn't GzipOut.finish() call this.close()? Otherwise the 
file remains unflushed.
- And what about calling this.close() in GzipOut.~this()? Is that 
needed to?


Re: Read and write gzip files easily.

2015-05-03 Thread via Digitalmars-d

I've polished your module a bit at:

https://github.com/nordlow/justd/blob/611ae3aac35a085af966e0c3b717deb0012f637b/zio.d

Latest at

https://github.com/nordlow/justd/blob/master/zio.d


Re: Read and write gzip files easily.

2014-02-20 Thread Stephan Schiffels
On Wednesday, 19 February 2014 at 15:51:53 UTC, Kamil Slowikowski 
wrote:
Hi there, I'm new to D and have a lot of learning ahead of me. 
It would
be extremely helpful to me if someone with D experience could 
show me

some code examples.

I'd like to neatly read and write gzipped files for my work. I 
have read
several threads on these forums on the topic of std.zlib or 
std.zip and I haven't been able to figure it out.




Hi Kamil,
I am glad someone has the exact same problem as I had. I actually 
solved this, inspired by the python API you quoted above. I wrote 
these classes:

GzipInputRange, GzipByLine, and GzipOut.
Here is how I can now use them:

_

import gzip;
import std.stdio;

void main() {
auto byLine = new GzipByLine(test.gz);
foreach(line; byLine)
   writeln(line);

auto gzipOutFile = new GzipOut(testout.gz);
gzipOutFile.compress(bla bla bla);
gzipOutFile.finish();
}

That is all quite convenient and I was wondering whether 
something like that would be useful even in Phobos. But it's 
clear that for phobos things would involve a lot more work to 
comply with the requirements. This so far simply served my needs 
and is not as generic as it could be:


Here is the code:

___gzip.d__
import std.zlib;
import std.stdio;
import std.range;
import std.traits;

class GzipInputRange {
  UnCompress uncompressObj;
  File f;
  auto CHUNKSIZE = 0x4000;
  ReturnType!(f.byChunk) chunkRange;
  bool exhausted;
  char[] uncompressedBuffer;
  size_t bufferIndex;

  this(string filename) {
f = File(filename, r);
chunkRange = f.byChunk(CHUNKSIZE);
uncompressObj = new UnCompress();
load();
  }

  void load() {
if(!chunkRange.empty) {
  auto raw = chunkRange.front.dup;
  chunkRange.popFront();
  uncompressedBuffer = 
cast(char[])uncompressObj.uncompress(raw);

  bufferIndex = 0;
}
else {
  if(!exhausted) {
uncompressedBuffer = cast(char[])uncompressObj.flush();
exhausted = true;
bufferIndex = 0;
  }
  else
uncompressedBuffer.length = 0;
}
  }

  @property char front() {
return uncompressedBuffer[bufferIndex];
  }

  void popFront() {
bufferIndex += 1;
if(bufferIndex = uncompressedBuffer.length) {
  load();
  bufferIndex = 0;
}
  }

  @property bool empty() {
return uncompressedBuffer.length == 0;
  }
}

class GzipByLine {
  GzipInputRange range;
  char[] buf;

  this(string filename) {
this.range = new GzipInputRange(filename);
popFront();
  }

  @property bool empty() {
return buf.length == 0;
  }

  void popFront() {
buf.length = 0;
while(!range.empty  range.front != '\n') {
  buf ~= range.front;
  range.popFront();
}
range.popFront();
  }

  string front() {
return buf.idup;
  }
}

class GzipOut {
  Compress compressObj;
  File f;

  this(string filename) {
f = File(filename, w);
compressObj = new Compress(HeaderFormat.gzip);
  }

  void compress(string s) {
auto compressed = compressObj.compress(s.dup);
f.rawWrite(compressed);
  }

  void finish() {
auto compressed = compressObj.flush();
f.rawWrite(compressed);
  }
}




Re: Read and write gzip files easily.

2014-02-20 Thread Kamil Slowikowski
On Thursday, 20 February 2014 at 10:35:50 UTC, Stephan Schiffels 
wrote:

Hi Kamil,
I am glad someone has the exact same problem as I had. I 
actually solved this, inspired by the python API you quoted 
above. I wrote these classes:

GzipInputRange, GzipByLine, and GzipOut.


Stephan, awesome! Thank you very much for sharing your classes. 
It's nice to see how you've approached this problem. Your code is 
very clear and easy to understand (for me).


Also, I now see the error in my code: I believe I should use 
rawWrite to write compressed data and not writeExact.


Re: Read and write gzip files easily.

2014-02-20 Thread Artem Tarasov
On Thu, Feb 20, 2014 at 9:05 PM, Kamil Slowikowski
kslowikow...@gmail.comwrote:


 Also, I now see the error in my code: I believe I should use rawWrite to
 write compressed data and not writeExact.


That's not an error, that's two different ways to access files:
std.stream.File and std.stdio.File - the latter is more recommended to use.


Re: Read and write gzip files easily.

2014-02-20 Thread Stephan Schiffels

On Thursday, 20 February 2014 at 17:05:37 UTC, Kamil Slowikowski
wrote:
On Thursday, 20 February 2014 at 10:35:50 UTC, Stephan 
Schiffels wrote:

Hi Kamil,
I am glad someone has the exact same problem as I had. I 
actually solved this, inspired by the python API you quoted 
above. I wrote these classes:

GzipInputRange, GzipByLine, and GzipOut.


Stephan, awesome! Thank you very much for sharing your classes. 
It's nice to see how you've approached this problem. Your code 
is very clear and easy to understand (for me).


Also, I now see the error in my code: I believe I should use 
rawWrite to write compressed data and not writeExact.


You're welcome. If you manage to put GzipOut.finish() into the
destructor of the class to automatically flush the file upon
destruction of the object, let me know. I tried this and it gives
a SegFault… I was too lazy to try to understand it but I am sure
it must be in principle possible.

Stephan


Read and write gzip files easily.

2014-02-19 Thread Kamil Slowikowski
Hi there, I'm new to D and have a lot of learning ahead of me. It 
would
be extremely helpful to me if someone with D experience could 
show me

some code examples.

I'd like to neatly read and write gzipped files for my work. I 
have read
several threads on these forums on the topic of std.zlib or 
std.zip and I haven't been able to figure it out.


Here's a Python script that does what I want. Can you please show 
me

an example in D that does the same thing?

code
#!/usr/bin/env python

import gzip

# Read a gzipped file and print the contents line by line.
with gzip.open(input.gz) as stream:
for line in stream:
print line

# Write some text to a gzipped file.
with gzip.open(output.gz, w) as stream:
stream.write(some output goes here\n)
/code


I have a second request. I would like to start using D more in my 
work,
and in particular I would like to use and extend the BioD 
library. Artem
Tarasov made a nice module to handle BGZF, and I would like to 
see an

example like my Python code above using Artem's module.

Read more about BGZF:
http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html

BioD:
https://github.com/biod/BioD/blob/d2bea0a0da63eb820fcf11ae367456b2c367ec04/bio/core/bgzf/compress.d


Re: Read and write gzip files easily.

2014-02-19 Thread Artem Tarasov
Wow, that's unexpected :)

Unfortunately, there's no standard module for processing gzip/bz2. The
former can be dealt with using etc.c.zlib, but there's no convenient
interface for working with file as a stream. Thus, the easiest way that I
know of is as follows:

import std.stdio, std.process;
auto pipe = pipeShell(gunzip -c  ~ filename); // replace with pigz if you
wish
File input = pipe.stdout;

Regarding your second request, this forum is not an appropriate place to
provide usage examples for a library, so that will go into a private e-mail.


On Wed, Feb 19, 2014 at 7:51 PM, Kamil Slowikowski
kslowikow...@gmail.comwrote:


 I have a second request. I would like to start using D more in my work,
 and in particular I would like to use and extend the BioD library. Artem
 Tarasov made a nice module to handle BGZF, and I would like to see an
 example like my Python code above using Artem's module.



Re: Read and write gzip files easily.

2014-02-19 Thread Craig Dillabaugh
On Wednesday, 19 February 2014 at 15:51:53 UTC, Kamil Slowikowski 
wrote:
Hi there, I'm new to D and have a lot of learning ahead of me. 
It would
be extremely helpful to me if someone with D experience could 
show me

some code examples.

I'd like to neatly read and write gzipped files for my work. I 
have read
several threads on these forums on the topic of std.zlib or 
std.zip and I haven't been able to figure it out.


Here's a Python script that does what I want. Can you please 
show me

an example in D that does the same thing?

code
#!/usr/bin/env python

import gzip

# Read a gzipped file and print the contents line by line.
with gzip.open(input.gz) as stream:
for line in stream:
print line

# Write some text to a gzipped file.
with gzip.open(output.gz, w) as stream:
stream.write(some output goes here\n)
/code


I have a second request. I would like to start using D more in 
my work,
and in particular I would like to use and extend the BioD 
library. Artem
Tarasov made a nice module to handle BGZF, and I would like to 
see an

example like my Python code above using Artem's module.

Read more about BGZF:
http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html

BioD:
https://github.com/biod/BioD/blob/d2bea0a0da63eb820fcf11ae367456b2c367ec04/bio/core/bgzf/compress.d


It is not part of the standard library, but you may want to have 
a look at the GzipInputStream in vibeD.


http://vibed.org/api/vibe.stream.zlib/GzipInputStream



Re: Read and write gzip files easily.

2014-02-19 Thread Craig Dillabaugh
On Wednesday, 19 February 2014 at 16:32:54 UTC, Craig Dillabaugh 
wrote:
On Wednesday, 19 February 2014 at 15:51:53 UTC, Kamil 
Slowikowski wrote:


It is not part of the standard library, but you may want to 
have a look at the GzipInputStream in vibeD.


http://vibed.org/api/vibe.stream.zlib/GzipInputStream


Also meant to add, this thread belongs in the D.learn forum 
rather than here.




Re: Read and write gzip files easily.

2014-02-19 Thread Adam D. Ruppe
On Wednesday, 19 February 2014 at 16:27:32 UTC, Artem Tarasov 
wrote:
Unfortunately, there's no standard module for processing 
gzip/bz2.


std.zlib handles gzip but it doesn't present a file nor range 
interface over it.


This will work though:

void main() {
import std.zlib;
import std.stdio;
auto uc = new UnCompress();

foreach(chunk; File(testd.gz).byChunk(1024)) {
auto uncompressed = uc.uncompress(chunk);
writeln(cast(string) uncompressed);
}

// also look at anything left in the buffer
writeln(cast(string) uc.flush());
}


And if you are writing, use new Compress(HeaderFormat.gzip) then 
call the compress method and write what it returns to teh file.


Re: Read and write gzip files easily.

2014-02-19 Thread Artem Tarasov
Ah, indeed. I dismissed it because it allocates on each call, and heavy GC
usage in multithreaded app is a performance killer.

On Wed, Feb 19, 2014 at 8:36 PM, Adam D. Ruppe destructiona...@gmail.comwrote:


 std.zlib handles gzip but it doesn't present a file nor range interface
 over it.



Re: Read and write gzip files easily.

2014-02-19 Thread Kamil Slowikowski
On Wednesday, 19 February 2014 at 16:27:32 UTC, Artem Tarasov 
wrote:

the easiest way that I
know of is as follows:

import std.stdio, std.process;
auto pipe = pipeShell(gunzip -c  ~ filename); // replace with 
pigz if you

wish
File input = pipe.stdout;


Artem, thank you! I've used a similar trick in the past with 
Python because calling the system's gzip or pigz in a 
subprocess.Pipe is faster than using the python gzip module. I'm 
very glad to see how easy it is in D.


Regarding your second request, this forum is not an appropriate 
place to
provide usage examples for a library, so that will go into a 
private e-mail.


Thanks, again! I'm looking forward to hearing from you :)


@Adam D. Ruppe
Thanks for your example! I couldn't find such an example anywhere 
on the web.



@Craig Dillabaugh
Please feel free to move the thread, sorry for posting in the 
wrong place.


Re: Read and write gzip files easily.

2014-02-19 Thread nazriel
On Wednesday, 19 February 2014 at 15:51:53 UTC, Kamil Slowikowski 
wrote:
Hi there, I'm new to D and have a lot of learning ahead of me. 
It would
be extremely helpful to me if someone with D experience could 
show me

some code examples.

I'd like to neatly read and write gzipped files for my work. I 
have read
several threads on these forums on the topic of std.zlib or 
std.zip and I haven't been able to figure it out.


Here's a Python script that does what I want. Can you please 
show me

an example in D that does the same thing?

code
#!/usr/bin/env python

import gzip

# Read a gzipped file and print the contents line by line.
with gzip.open(input.gz) as stream:
for line in stream:
print line

# Write some text to a gzipped file.
with gzip.open(output.gz, w) as stream:
stream.write(some output goes here\n)
/code


I have a second request. I would like to start using D more in 
my work,
and in particular I would like to use and extend the BioD 
library. Artem
Tarasov made a nice module to handle BGZF, and I would like to 
see an

example like my Python code above using Artem's module.

Read more about BGZF:
http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html

BioD:
https://github.com/biod/BioD/blob/d2bea0a0da63eb820fcf11ae367456b2c367ec04/bio/core/bgzf/compress.d


Witaj Kamil :)

Feel free to also visit #d channel on freenode IRC network.


Re: Read and write gzip files easily.

2014-02-19 Thread Craig Dillabaugh

@Craig Dillabaugh
Please feel free to move the thread, sorry for posting in the 
wrong place.


Actually, the thread can't be moved I believe, it is here forever.

Not a big deal though, lots of people new to D post questions 
here and miss the D.learn forum, so you are not alone. Since I 
didn't have a good answer to your original question I decided I 
should let you know about D.learn.





Re: Read and write gzip files easily.

2014-02-19 Thread Vladimir Panteleev
On Wednesday, 19 February 2014 at 16:36:29 UTC, Adam D. Ruppe 
wrote:
On Wednesday, 19 February 2014 at 16:27:32 UTC, Artem Tarasov 
wrote:
Unfortunately, there's no standard module for processing 
gzip/bz2.


std.zlib handles gzip but it doesn't present a file nor range 
interface over it.


This will work though:

void main() {
import std.zlib;
import std.stdio;
auto uc = new UnCompress();

foreach(chunk; File(testd.gz).byChunk(1024)) {
auto uncompressed = uc.uncompress(chunk);
writeln(cast(string) uncompressed);
}

// also look at anything left in the buffer
writeln(cast(string) uc.flush());
}



Regrettably, the above code has a bug. Currently, std.zlib stores 
a reference to the buffer passed to it, and since byChunk reuses 
the buffer, the code will fail when uncompressing multiple chunks.


Re: Read and write gzip files easily.

2014-02-19 Thread Kamil Slowikowski
On Wednesday, 19 February 2014 at 16:36:29 UTC, Adam D. Ruppe 
wrote:
And if you are writing, use new Compress(HeaderFormat.gzip) 
then call the compress method and write what it returns to teh 
file.


I successfully read and printed the contents of a gzipped file, 
but the documentation is too sparse for me to figure out why I 
can't write a gzipped file.


http://dlang.org/phobos/std_zlib.html#.Compress

I'd appreciate any tips.


Here's the output:

- - -
$ echo -e hi there\nhere's some text in a file\n-K | gzip  
test.gz


$ zcat test.gz
hi there
here's some text in a file
-K

$ ./zfile.d test.gz out.gz
hi there
here's some text in a file
-K

$ zcat out.gz

gzip: out.gz: unexpected end of file
- - -


And the code:

- - -
#!/usr/bin/env rdmd
// zfile.d
import std.stdio,
   std.stream,
   std.zlib,
   std.c.process,
   std.process,
   std.file;

void main(string[] args)
{
if (args.length != 3) {
writefln(Usage: ./%s file output, args[0]);
exit(0);
}

// Read command line arguments.
string filename = args[1];
string outfile = args[2];
auto len = filename.length;

std.file.File input;
// Automatically decompress the file if it ends with gz.
if (filename[len - 2 .. len] == gz) {
auto pipe = pipeShell(gunzip -c  ~ filename);
input = pipe.stdout;
} else {
input = std.stdio.File(filename);
}

// Write data to a stream in memory
auto mem = new MemoryStream();
string line;
while ((line = input.readln()) !is null) {
mem.write(line);
// Also write the line to stdout.
write(line);
}

// Put the uncompressed data into a new gz file.
auto comp = new Compress(HeaderFormat.gzip);
auto compressed = comp.compress(mem.data);
//comp.flush(); // Does not fix the problem.

// See the raw compressed bytes.
//writeln(cast(ubyte[])compressed);

// Write compressed output to a file.
with (new std.stream.File(outfile, FileMode.OutNew)) {
writeExact(compressed.ptr, compressed.length);
//write(cast(ubyte[])compressed); // Also does not work.
}
}
- - -


Re: Read and write gzip files easily.

2014-02-19 Thread Adam D. Ruppe
On Thursday, 20 February 2014 at 03:58:01 UTC, Kamil Slowikowski 
wrote:

auto compressed = comp.compress(mem.data);
//comp.flush(); // Does not fix the problem.


You need to write each compressed block and the flush. So more 
like:


writeToFile(comp.compress(mem.data)); // loop over all the data 
btw

writeToFile(comp.flush());

and that should do it.


flush returns the remainder of the data.


Re: Read and write gzip files easily.

2014-02-19 Thread Kamil Slowikowski
On Thursday, 20 February 2014 at 04:03:45 UTC, Adam D. Ruppe 
wrote:
On Thursday, 20 February 2014 at 03:58:01 UTC, Kamil 
Slowikowski wrote:

   auto compressed = comp.compress(mem.data);
   //comp.flush(); // Does not fix the problem.


You need to write each compressed block and the flush. So more 
like:


writeToFile(comp.compress(mem.data)); // loop over all the data 
btw

writeToFile(comp.flush());

and that should do it.


flush returns the remainder of the data.


Hey Adam, thanks for the tip.

Next problem: the output has strange characters, as shown:

- - -
./zfile.d test.gz out.gz
hi there
here's some text in a file
-K

Thu Feb 20 00:07:52 kamil W530 ~/work/dlang
zcat out.gz
hi there
here's some text in a file
-K

zcat test.gz | wc -c
39

zcat out.gz | wc -c
63

zcat test.gz | hexdump
000 6968 7420 6568 6572 680a 7265 2765 2073
010 6f73 656d 7420 7865 2074 6e69 6120 6620
020 6c69 0a65 4b2d 000a
027

zcat out.gz | hexdump
000 0009    6968 7420 6568 6572
010 1b0a    6800 7265 2765 2073
020 6f73 656d 7420 7865 2074 6e69 6120 6620
030 6c69 0a65 0003    4b2d 000a
03f
- - -


Code:

- - -
#!/usr/bin/env rdmd
import std.stdio,
   std.stream,
   std.zlib,
   std.c.process,
   std.process,
   std.file;

void main(string[] args)
{
if (args.length != 3) {
writefln(Usage: ./%s file output, args[0]);
exit(0);
}

// Read command line arguments.
string filename = args[1];
string outfile = args[2];
auto len = filename.length;

std.file.File input;
// Automatically decompress the file if it ends with gz.
if (filename[len - 2 .. len] == gz) {
auto pipe = pipeShell(gunzip -c  ~ filename);
input = pipe.stdout;
} else {
input = std.stdio.File(filename);
}

// Write data to a stream in memory
auto mem = new MemoryStream();
string line;
while ((line = input.readln()) !is null) {
mem.write(line);
// Also write the line to stdout.
write(line);
}

// Put the data into a new gz file.
auto comp = new Compress(HeaderFormat.gzip);
// See the raw compressed bytes.
//writeln(cast(ubyte[])compressed);

// Write compressed output to a file.
with (new std.stream.File(outfile, FileMode.OutNew)) {
auto compressed = comp.compress(mem.data);
writeExact(compressed.ptr, compressed.length);
// Get any remaining data.
compressed = comp.flush();
writeExact(compressed.ptr, compressed.length);
}
}
- - -