Re: Read and write gzip files easily.
On Sunday, 3 May 2015 at 14:35:49 UTC, Per Nordlöw wrote: Latest at https://github.com/nordlow/justd/blob/master/zio.d Should be https://github.com/nordlow/phobos-next/blob/master/src/zio.d
Re: Read and write gzip files easily.
And there is Zipios++ http://zipios.sourceforge.net/ On Sun, 2015-05-03 at 14:33 +, via Digitalmars-d wrote: On Thursday, 20 February 2014 at 10:35:50 UTC, Stephan Schiffels wrote: Hi Kamil, I am glad someone has the exact same problem as I had. I actually solved this, inspired by the python API you quoted above. I wrote these classes: GzipInputRange, GzipByLine, and GzipOut. Here is how I can now use them: I've polished your module a bit at: https://github.com/nordlow/justd/blob/611ae3aac35a085af966e0c3b717deb 0012f637b/zio.d Reflections: - Performance is terrible even with -release -noboundscheck -unittest. About 20 times slower than zcat $F | wc -l. I'm guessing _chunkRange.front.dup slows things down. I tried removing the .dup but then I get std.zlib.ZlibException@std/zlib.d(59): data error I don't believe we should have to do a copy of _chunkRange.front but I can't figure out how to solve it. Anybody understands how to fix this? - Shouldn't GzipOut.finish() call this.close()? Otherwise the file remains unflushed. - And what about calling this.close() in GzipOut.~this()? Is that needed to? -- Russel. = Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Roadm: +44 7770 465 077 xmpp: rus...@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder signature.asc Description: This is a digitally signed message part
Re: Read and write gzip files easily.
On Thursday, 20 February 2014 at 10:35:50 UTC, Stephan Schiffels wrote: Hi Kamil, I am glad someone has the exact same problem as I had. I actually solved this, inspired by the python API you quoted above. I wrote these classes: GzipInputRange, GzipByLine, and GzipOut. Here is how I can now use them: I've polished your module a bit at: https://github.com/nordlow/justd/blob/611ae3aac35a085af966e0c3b717deb0012f637b/zio.d Reflections: - Performance is terrible even with -release -noboundscheck -unittest. About 20 times slower than zcat $F | wc -l. I'm guessing _chunkRange.front.dup slows things down. I tried removing the .dup but then I get std.zlib.ZlibException@std/zlib.d(59): data error I don't believe we should have to do a copy of _chunkRange.front but I can't figure out how to solve it. Anybody understands how to fix this? - Shouldn't GzipOut.finish() call this.close()? Otherwise the file remains unflushed. - And what about calling this.close() in GzipOut.~this()? Is that needed to?
Re: Read and write gzip files easily.
I've polished your module a bit at: https://github.com/nordlow/justd/blob/611ae3aac35a085af966e0c3b717deb0012f637b/zio.d Latest at https://github.com/nordlow/justd/blob/master/zio.d
Re: Read and write gzip files easily.
On Wednesday, 19 February 2014 at 15:51:53 UTC, Kamil Slowikowski wrote: Hi there, I'm new to D and have a lot of learning ahead of me. It would be extremely helpful to me if someone with D experience could show me some code examples. I'd like to neatly read and write gzipped files for my work. I have read several threads on these forums on the topic of std.zlib or std.zip and I haven't been able to figure it out. Hi Kamil, I am glad someone has the exact same problem as I had. I actually solved this, inspired by the python API you quoted above. I wrote these classes: GzipInputRange, GzipByLine, and GzipOut. Here is how I can now use them: _ import gzip; import std.stdio; void main() { auto byLine = new GzipByLine(test.gz); foreach(line; byLine) writeln(line); auto gzipOutFile = new GzipOut(testout.gz); gzipOutFile.compress(bla bla bla); gzipOutFile.finish(); } That is all quite convenient and I was wondering whether something like that would be useful even in Phobos. But it's clear that for phobos things would involve a lot more work to comply with the requirements. This so far simply served my needs and is not as generic as it could be: Here is the code: ___gzip.d__ import std.zlib; import std.stdio; import std.range; import std.traits; class GzipInputRange { UnCompress uncompressObj; File f; auto CHUNKSIZE = 0x4000; ReturnType!(f.byChunk) chunkRange; bool exhausted; char[] uncompressedBuffer; size_t bufferIndex; this(string filename) { f = File(filename, r); chunkRange = f.byChunk(CHUNKSIZE); uncompressObj = new UnCompress(); load(); } void load() { if(!chunkRange.empty) { auto raw = chunkRange.front.dup; chunkRange.popFront(); uncompressedBuffer = cast(char[])uncompressObj.uncompress(raw); bufferIndex = 0; } else { if(!exhausted) { uncompressedBuffer = cast(char[])uncompressObj.flush(); exhausted = true; bufferIndex = 0; } else uncompressedBuffer.length = 0; } } @property char front() { return uncompressedBuffer[bufferIndex]; } void popFront() { bufferIndex += 1; if(bufferIndex = uncompressedBuffer.length) { load(); bufferIndex = 0; } } @property bool empty() { return uncompressedBuffer.length == 0; } } class GzipByLine { GzipInputRange range; char[] buf; this(string filename) { this.range = new GzipInputRange(filename); popFront(); } @property bool empty() { return buf.length == 0; } void popFront() { buf.length = 0; while(!range.empty range.front != '\n') { buf ~= range.front; range.popFront(); } range.popFront(); } string front() { return buf.idup; } } class GzipOut { Compress compressObj; File f; this(string filename) { f = File(filename, w); compressObj = new Compress(HeaderFormat.gzip); } void compress(string s) { auto compressed = compressObj.compress(s.dup); f.rawWrite(compressed); } void finish() { auto compressed = compressObj.flush(); f.rawWrite(compressed); } }
Re: Read and write gzip files easily.
On Thursday, 20 February 2014 at 10:35:50 UTC, Stephan Schiffels wrote: Hi Kamil, I am glad someone has the exact same problem as I had. I actually solved this, inspired by the python API you quoted above. I wrote these classes: GzipInputRange, GzipByLine, and GzipOut. Stephan, awesome! Thank you very much for sharing your classes. It's nice to see how you've approached this problem. Your code is very clear and easy to understand (for me). Also, I now see the error in my code: I believe I should use rawWrite to write compressed data and not writeExact.
Re: Read and write gzip files easily.
On Thu, Feb 20, 2014 at 9:05 PM, Kamil Slowikowski kslowikow...@gmail.comwrote: Also, I now see the error in my code: I believe I should use rawWrite to write compressed data and not writeExact. That's not an error, that's two different ways to access files: std.stream.File and std.stdio.File - the latter is more recommended to use.
Re: Read and write gzip files easily.
On Thursday, 20 February 2014 at 17:05:37 UTC, Kamil Slowikowski wrote: On Thursday, 20 February 2014 at 10:35:50 UTC, Stephan Schiffels wrote: Hi Kamil, I am glad someone has the exact same problem as I had. I actually solved this, inspired by the python API you quoted above. I wrote these classes: GzipInputRange, GzipByLine, and GzipOut. Stephan, awesome! Thank you very much for sharing your classes. It's nice to see how you've approached this problem. Your code is very clear and easy to understand (for me). Also, I now see the error in my code: I believe I should use rawWrite to write compressed data and not writeExact. You're welcome. If you manage to put GzipOut.finish() into the destructor of the class to automatically flush the file upon destruction of the object, let me know. I tried this and it gives a SegFault… I was too lazy to try to understand it but I am sure it must be in principle possible. Stephan
Read and write gzip files easily.
Hi there, I'm new to D and have a lot of learning ahead of me. It would be extremely helpful to me if someone with D experience could show me some code examples. I'd like to neatly read and write gzipped files for my work. I have read several threads on these forums on the topic of std.zlib or std.zip and I haven't been able to figure it out. Here's a Python script that does what I want. Can you please show me an example in D that does the same thing? code #!/usr/bin/env python import gzip # Read a gzipped file and print the contents line by line. with gzip.open(input.gz) as stream: for line in stream: print line # Write some text to a gzipped file. with gzip.open(output.gz, w) as stream: stream.write(some output goes here\n) /code I have a second request. I would like to start using D more in my work, and in particular I would like to use and extend the BioD library. Artem Tarasov made a nice module to handle BGZF, and I would like to see an example like my Python code above using Artem's module. Read more about BGZF: http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html BioD: https://github.com/biod/BioD/blob/d2bea0a0da63eb820fcf11ae367456b2c367ec04/bio/core/bgzf/compress.d
Re: Read and write gzip files easily.
Wow, that's unexpected :) Unfortunately, there's no standard module for processing gzip/bz2. The former can be dealt with using etc.c.zlib, but there's no convenient interface for working with file as a stream. Thus, the easiest way that I know of is as follows: import std.stdio, std.process; auto pipe = pipeShell(gunzip -c ~ filename); // replace with pigz if you wish File input = pipe.stdout; Regarding your second request, this forum is not an appropriate place to provide usage examples for a library, so that will go into a private e-mail. On Wed, Feb 19, 2014 at 7:51 PM, Kamil Slowikowski kslowikow...@gmail.comwrote: I have a second request. I would like to start using D more in my work, and in particular I would like to use and extend the BioD library. Artem Tarasov made a nice module to handle BGZF, and I would like to see an example like my Python code above using Artem's module.
Re: Read and write gzip files easily.
On Wednesday, 19 February 2014 at 15:51:53 UTC, Kamil Slowikowski wrote: Hi there, I'm new to D and have a lot of learning ahead of me. It would be extremely helpful to me if someone with D experience could show me some code examples. I'd like to neatly read and write gzipped files for my work. I have read several threads on these forums on the topic of std.zlib or std.zip and I haven't been able to figure it out. Here's a Python script that does what I want. Can you please show me an example in D that does the same thing? code #!/usr/bin/env python import gzip # Read a gzipped file and print the contents line by line. with gzip.open(input.gz) as stream: for line in stream: print line # Write some text to a gzipped file. with gzip.open(output.gz, w) as stream: stream.write(some output goes here\n) /code I have a second request. I would like to start using D more in my work, and in particular I would like to use and extend the BioD library. Artem Tarasov made a nice module to handle BGZF, and I would like to see an example like my Python code above using Artem's module. Read more about BGZF: http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html BioD: https://github.com/biod/BioD/blob/d2bea0a0da63eb820fcf11ae367456b2c367ec04/bio/core/bgzf/compress.d It is not part of the standard library, but you may want to have a look at the GzipInputStream in vibeD. http://vibed.org/api/vibe.stream.zlib/GzipInputStream
Re: Read and write gzip files easily.
On Wednesday, 19 February 2014 at 16:32:54 UTC, Craig Dillabaugh wrote: On Wednesday, 19 February 2014 at 15:51:53 UTC, Kamil Slowikowski wrote: It is not part of the standard library, but you may want to have a look at the GzipInputStream in vibeD. http://vibed.org/api/vibe.stream.zlib/GzipInputStream Also meant to add, this thread belongs in the D.learn forum rather than here.
Re: Read and write gzip files easily.
On Wednesday, 19 February 2014 at 16:27:32 UTC, Artem Tarasov wrote: Unfortunately, there's no standard module for processing gzip/bz2. std.zlib handles gzip but it doesn't present a file nor range interface over it. This will work though: void main() { import std.zlib; import std.stdio; auto uc = new UnCompress(); foreach(chunk; File(testd.gz).byChunk(1024)) { auto uncompressed = uc.uncompress(chunk); writeln(cast(string) uncompressed); } // also look at anything left in the buffer writeln(cast(string) uc.flush()); } And if you are writing, use new Compress(HeaderFormat.gzip) then call the compress method and write what it returns to teh file.
Re: Read and write gzip files easily.
Ah, indeed. I dismissed it because it allocates on each call, and heavy GC usage in multithreaded app is a performance killer. On Wed, Feb 19, 2014 at 8:36 PM, Adam D. Ruppe destructiona...@gmail.comwrote: std.zlib handles gzip but it doesn't present a file nor range interface over it.
Re: Read and write gzip files easily.
On Wednesday, 19 February 2014 at 16:27:32 UTC, Artem Tarasov wrote: the easiest way that I know of is as follows: import std.stdio, std.process; auto pipe = pipeShell(gunzip -c ~ filename); // replace with pigz if you wish File input = pipe.stdout; Artem, thank you! I've used a similar trick in the past with Python because calling the system's gzip or pigz in a subprocess.Pipe is faster than using the python gzip module. I'm very glad to see how easy it is in D. Regarding your second request, this forum is not an appropriate place to provide usage examples for a library, so that will go into a private e-mail. Thanks, again! I'm looking forward to hearing from you :) @Adam D. Ruppe Thanks for your example! I couldn't find such an example anywhere on the web. @Craig Dillabaugh Please feel free to move the thread, sorry for posting in the wrong place.
Re: Read and write gzip files easily.
On Wednesday, 19 February 2014 at 15:51:53 UTC, Kamil Slowikowski wrote: Hi there, I'm new to D and have a lot of learning ahead of me. It would be extremely helpful to me if someone with D experience could show me some code examples. I'd like to neatly read and write gzipped files for my work. I have read several threads on these forums on the topic of std.zlib or std.zip and I haven't been able to figure it out. Here's a Python script that does what I want. Can you please show me an example in D that does the same thing? code #!/usr/bin/env python import gzip # Read a gzipped file and print the contents line by line. with gzip.open(input.gz) as stream: for line in stream: print line # Write some text to a gzipped file. with gzip.open(output.gz, w) as stream: stream.write(some output goes here\n) /code I have a second request. I would like to start using D more in my work, and in particular I would like to use and extend the BioD library. Artem Tarasov made a nice module to handle BGZF, and I would like to see an example like my Python code above using Artem's module. Read more about BGZF: http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html BioD: https://github.com/biod/BioD/blob/d2bea0a0da63eb820fcf11ae367456b2c367ec04/bio/core/bgzf/compress.d Witaj Kamil :) Feel free to also visit #d channel on freenode IRC network.
Re: Read and write gzip files easily.
@Craig Dillabaugh Please feel free to move the thread, sorry for posting in the wrong place. Actually, the thread can't be moved I believe, it is here forever. Not a big deal though, lots of people new to D post questions here and miss the D.learn forum, so you are not alone. Since I didn't have a good answer to your original question I decided I should let you know about D.learn.
Re: Read and write gzip files easily.
On Wednesday, 19 February 2014 at 16:36:29 UTC, Adam D. Ruppe wrote: On Wednesday, 19 February 2014 at 16:27:32 UTC, Artem Tarasov wrote: Unfortunately, there's no standard module for processing gzip/bz2. std.zlib handles gzip but it doesn't present a file nor range interface over it. This will work though: void main() { import std.zlib; import std.stdio; auto uc = new UnCompress(); foreach(chunk; File(testd.gz).byChunk(1024)) { auto uncompressed = uc.uncompress(chunk); writeln(cast(string) uncompressed); } // also look at anything left in the buffer writeln(cast(string) uc.flush()); } Regrettably, the above code has a bug. Currently, std.zlib stores a reference to the buffer passed to it, and since byChunk reuses the buffer, the code will fail when uncompressing multiple chunks.
Re: Read and write gzip files easily.
On Wednesday, 19 February 2014 at 16:36:29 UTC, Adam D. Ruppe wrote: And if you are writing, use new Compress(HeaderFormat.gzip) then call the compress method and write what it returns to teh file. I successfully read and printed the contents of a gzipped file, but the documentation is too sparse for me to figure out why I can't write a gzipped file. http://dlang.org/phobos/std_zlib.html#.Compress I'd appreciate any tips. Here's the output: - - - $ echo -e hi there\nhere's some text in a file\n-K | gzip test.gz $ zcat test.gz hi there here's some text in a file -K $ ./zfile.d test.gz out.gz hi there here's some text in a file -K $ zcat out.gz gzip: out.gz: unexpected end of file - - - And the code: - - - #!/usr/bin/env rdmd // zfile.d import std.stdio, std.stream, std.zlib, std.c.process, std.process, std.file; void main(string[] args) { if (args.length != 3) { writefln(Usage: ./%s file output, args[0]); exit(0); } // Read command line arguments. string filename = args[1]; string outfile = args[2]; auto len = filename.length; std.file.File input; // Automatically decompress the file if it ends with gz. if (filename[len - 2 .. len] == gz) { auto pipe = pipeShell(gunzip -c ~ filename); input = pipe.stdout; } else { input = std.stdio.File(filename); } // Write data to a stream in memory auto mem = new MemoryStream(); string line; while ((line = input.readln()) !is null) { mem.write(line); // Also write the line to stdout. write(line); } // Put the uncompressed data into a new gz file. auto comp = new Compress(HeaderFormat.gzip); auto compressed = comp.compress(mem.data); //comp.flush(); // Does not fix the problem. // See the raw compressed bytes. //writeln(cast(ubyte[])compressed); // Write compressed output to a file. with (new std.stream.File(outfile, FileMode.OutNew)) { writeExact(compressed.ptr, compressed.length); //write(cast(ubyte[])compressed); // Also does not work. } } - - -
Re: Read and write gzip files easily.
On Thursday, 20 February 2014 at 03:58:01 UTC, Kamil Slowikowski wrote: auto compressed = comp.compress(mem.data); //comp.flush(); // Does not fix the problem. You need to write each compressed block and the flush. So more like: writeToFile(comp.compress(mem.data)); // loop over all the data btw writeToFile(comp.flush()); and that should do it. flush returns the remainder of the data.
Re: Read and write gzip files easily.
On Thursday, 20 February 2014 at 04:03:45 UTC, Adam D. Ruppe wrote: On Thursday, 20 February 2014 at 03:58:01 UTC, Kamil Slowikowski wrote: auto compressed = comp.compress(mem.data); //comp.flush(); // Does not fix the problem. You need to write each compressed block and the flush. So more like: writeToFile(comp.compress(mem.data)); // loop over all the data btw writeToFile(comp.flush()); and that should do it. flush returns the remainder of the data. Hey Adam, thanks for the tip. Next problem: the output has strange characters, as shown: - - - ./zfile.d test.gz out.gz hi there here's some text in a file -K Thu Feb 20 00:07:52 kamil W530 ~/work/dlang zcat out.gz hi there here's some text in a file -K zcat test.gz | wc -c 39 zcat out.gz | wc -c 63 zcat test.gz | hexdump 000 6968 7420 6568 6572 680a 7265 2765 2073 010 6f73 656d 7420 7865 2074 6e69 6120 6620 020 6c69 0a65 4b2d 000a 027 zcat out.gz | hexdump 000 0009 6968 7420 6568 6572 010 1b0a 6800 7265 2765 2073 020 6f73 656d 7420 7865 2074 6e69 6120 6620 030 6c69 0a65 0003 4b2d 000a 03f - - - Code: - - - #!/usr/bin/env rdmd import std.stdio, std.stream, std.zlib, std.c.process, std.process, std.file; void main(string[] args) { if (args.length != 3) { writefln(Usage: ./%s file output, args[0]); exit(0); } // Read command line arguments. string filename = args[1]; string outfile = args[2]; auto len = filename.length; std.file.File input; // Automatically decompress the file if it ends with gz. if (filename[len - 2 .. len] == gz) { auto pipe = pipeShell(gunzip -c ~ filename); input = pipe.stdout; } else { input = std.stdio.File(filename); } // Write data to a stream in memory auto mem = new MemoryStream(); string line; while ((line = input.readln()) !is null) { mem.write(line); // Also write the line to stdout. write(line); } // Put the data into a new gz file. auto comp = new Compress(HeaderFormat.gzip); // See the raw compressed bytes. //writeln(cast(ubyte[])compressed); // Write compressed output to a file. with (new std.stream.File(outfile, FileMode.OutNew)) { auto compressed = comp.compress(mem.data); writeExact(compressed.ptr, compressed.length); // Get any remaining data. compressed = comp.flush(); writeExact(compressed.ptr, compressed.length); } } - - -