Re: GzipOutputStream is slower than gziping the file by hand
Here is the patch. I included parameters for both the compression level and the compression strategy, both of them defaulting to the same thing the gzip binary uses. --Pete On Wed, Jul 15, 2009 at 4:37 PM, Kenton Varda wrote: > OK. Make sure that the parameter is optional, with the default matching > gzip's default. Thanks. > > On Wed, Jul 15, 2009 at 4:27 PM, Peter Keen wrote: >> >> It doesn't look like it has an existing interface for setting zlib >> options. Reading through the source[1], it looks like it chooses >> Z_BEST_COMPRESSION (-9 to the gzip command line program) whereas gzip >> defaults to -6. I'll work up a patch to pass the compression value >> through as another option to the constructor. >> >> --Pete >> >> On Jul 15, 4:09 pm, Kenton Varda wrote: >> > Hmm, probably GzipOutputStream is not setting the same compression >> > parameters as gzip itself uses by default. I'm happy to accept a patch >> > fixing this. Does the interface (to GzipOutputStream) currently have a >> > way >> > to control compression parameters? If not, it probably should. >> > >> > >> > >> > On Wed, Jul 15, 2009 at 3:50 PM, Peter Keen >> > wrote: >> > >> > > Hi guys, >> > >> > > I'm playing around with protocol buffers for a project at work and I'm >> > > coming across a possibly weird problem. I have the following setup in >> > > my main(): >> > >> > > std::cerr << "creating file" << std::endl; >> > > int fd = open("blah.repo", O_WRONLY, O_CREAT); >> > > if ( fd == -1 ) { >> > > std::cerr << "ERROR: " << errno << " " << strerror(errno) << >> > > std::endl; >> > > return 1; >> > > } >> > >> > > ZeroCopyOutputStream* raw_output = new FileOutputStream(fd); >> > > GzipOutputStream* gzip_output = new GzipOutputStream(raw_output, >> > > GzipOutputStream::ZLIB); >> > > CodedOutputStream* coded_output = new CodedOutputStream >> > > (gzip_output); >> > > // CodedOutputStream* coded_output = new CodedOutputStream >> > > (raw_output); >> > >> > > This version takes, say, 8 seconds to create and serialize 100k simple >> > > messages. If I flip it to not use the GzipOutputStream, it takes >> > > roughly 1 second. Using gzip(1) to compress the resulting file takes >> > > less than half a second. >> > >> > > Is there an option I need to be setting to bring it up to parity with >> > > the command-line program or could there be a bug in GzipOutputStream? >> > > For what it's worth, GzipInputStream is roughly on parity with a raw >> > > CodedOutputStream. >> > >> > > Thanks, >> > > Pete >> >> > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~--- gzip-compression-level.patch Description: Binary data
Re: GzipOutputStream is slower than gziping the file by hand
OK. Make sure that the parameter is optional, with the default matching gzip's default. Thanks. On Wed, Jul 15, 2009 at 4:27 PM, Peter Keen wrote: > > It doesn't look like it has an existing interface for setting zlib > options. Reading through the source[1], it looks like it chooses > Z_BEST_COMPRESSION (-9 to the gzip command line program) whereas gzip > defaults to -6. I'll work up a patch to pass the compression value > through as another option to the constructor. > > --Pete > > On Jul 15, 4:09 pm, Kenton Varda wrote: > > Hmm, probably GzipOutputStream is not setting the same compression > > parameters as gzip itself uses by default. I'm happy to accept a patch > > fixing this. Does the interface (to GzipOutputStream) currently have a > way > > to control compression parameters? If not, it probably should. > > > > > > > > On Wed, Jul 15, 2009 at 3:50 PM, Peter Keen > wrote: > > > > > Hi guys, > > > > > I'm playing around with protocol buffers for a project at work and I'm > > > coming across a possibly weird problem. I have the following setup in > > > my main(): > > > > >std::cerr << "creating file" << std::endl; > > >int fd = open("blah.repo", O_WRONLY, O_CREAT); > > >if ( fd == -1 ) { > > >std::cerr << "ERROR: " << errno << " " << strerror(errno) << > > > std::endl; > > >return 1; > > >} > > > > >ZeroCopyOutputStream* raw_output = new FileOutputStream(fd); > > >GzipOutputStream* gzip_output = new GzipOutputStream(raw_output, > > > GzipOutputStream::ZLIB); > > >CodedOutputStream* coded_output = new CodedOutputStream > > > (gzip_output); > > >// CodedOutputStream* coded_output = new CodedOutputStream > > > (raw_output); > > > > > This version takes, say, 8 seconds to create and serialize 100k simple > > > messages. If I flip it to not use the GzipOutputStream, it takes > > > roughly 1 second. Using gzip(1) to compress the resulting file takes > > > less than half a second. > > > > > Is there an option I need to be setting to bring it up to parity with > > > the command-line program or could there be a bug in GzipOutputStream? > > > For what it's worth, GzipInputStream is roughly on parity with a raw > > > CodedOutputStream. > > > > > Thanks, > > > Pete > > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: GzipOutputStream is slower than gziping the file by hand
It doesn't look like it has an existing interface for setting zlib options. Reading through the source[1], it looks like it chooses Z_BEST_COMPRESSION (-9 to the gzip command line program) whereas gzip defaults to -6. I'll work up a patch to pass the compression value through as another option to the constructor. --Pete On Jul 15, 4:09 pm, Kenton Varda wrote: > Hmm, probably GzipOutputStream is not setting the same compression > parameters as gzip itself uses by default. I'm happy to accept a patch > fixing this. Does the interface (to GzipOutputStream) currently have a way > to control compression parameters? If not, it probably should. > > > > On Wed, Jul 15, 2009 at 3:50 PM, Peter Keen wrote: > > > Hi guys, > > > I'm playing around with protocol buffers for a project at work and I'm > > coming across a possibly weird problem. I have the following setup in > > my main(): > > > std::cerr << "creating file" << std::endl; > > int fd = open("blah.repo", O_WRONLY, O_CREAT); > > if ( fd == -1 ) { > > std::cerr << "ERROR: " << errno << " " << strerror(errno) << > > std::endl; > > return 1; > > } > > > ZeroCopyOutputStream* raw_output = new FileOutputStream(fd); > > GzipOutputStream* gzip_output = new GzipOutputStream(raw_output, > > GzipOutputStream::ZLIB); > > CodedOutputStream* coded_output = new CodedOutputStream > > (gzip_output); > > // CodedOutputStream* coded_output = new CodedOutputStream > > (raw_output); > > > This version takes, say, 8 seconds to create and serialize 100k simple > > messages. If I flip it to not use the GzipOutputStream, it takes > > roughly 1 second. Using gzip(1) to compress the resulting file takes > > less than half a second. > > > Is there an option I need to be setting to bring it up to parity with > > the command-line program or could there be a bug in GzipOutputStream? > > For what it's worth, GzipInputStream is roughly on parity with a raw > > CodedOutputStream. > > > Thanks, > > Pete --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: GzipOutputStream is slower than gziping the file by hand
Hmm, probably GzipOutputStream is not setting the same compression parameters as gzip itself uses by default. I'm happy to accept a patch fixing this. Does the interface (to GzipOutputStream) currently have a way to control compression parameters? If not, it probably should. On Wed, Jul 15, 2009 at 3:50 PM, Peter Keen wrote: > > Hi guys, > > I'm playing around with protocol buffers for a project at work and I'm > coming across a possibly weird problem. I have the following setup in > my main(): > >std::cerr << "creating file" << std::endl; >int fd = open("blah.repo", O_WRONLY, O_CREAT); >if ( fd == -1 ) { >std::cerr << "ERROR: " << errno << " " << strerror(errno) << > std::endl; >return 1; >} > >ZeroCopyOutputStream* raw_output = new FileOutputStream(fd); >GzipOutputStream* gzip_output = new GzipOutputStream(raw_output, > GzipOutputStream::ZLIB); >CodedOutputStream* coded_output = new CodedOutputStream > (gzip_output); >// CodedOutputStream* coded_output = new CodedOutputStream > (raw_output); > > This version takes, say, 8 seconds to create and serialize 100k simple > messages. If I flip it to not use the GzipOutputStream, it takes > roughly 1 second. Using gzip(1) to compress the resulting file takes > less than half a second. > > Is there an option I need to be setting to bring it up to parity with > the command-line program or could there be a bug in GzipOutputStream? > For what it's worth, GzipInputStream is roughly on parity with a raw > CodedOutputStream. > > Thanks, > Pete > > > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
GzipOutputStream is slower than gziping the file by hand
Hi guys, I'm playing around with protocol buffers for a project at work and I'm coming across a possibly weird problem. I have the following setup in my main(): std::cerr << "creating file" << std::endl; int fd = open("blah.repo", O_WRONLY, O_CREAT); if ( fd == -1 ) { std::cerr << "ERROR: " << errno << " " << strerror(errno) << std::endl; return 1; } ZeroCopyOutputStream* raw_output = new FileOutputStream(fd); GzipOutputStream* gzip_output = new GzipOutputStream(raw_output, GzipOutputStream::ZLIB); CodedOutputStream* coded_output = new CodedOutputStream (gzip_output); // CodedOutputStream* coded_output = new CodedOutputStream (raw_output); This version takes, say, 8 seconds to create and serialize 100k simple messages. If I flip it to not use the GzipOutputStream, it takes roughly 1 second. Using gzip(1) to compress the resulting file takes less than half a second. Is there an option I need to be setting to bring it up to parity with the command-line program or could there be a bug in GzipOutputStream? For what it's worth, GzipInputStream is roughly on parity with a raw CodedOutputStream. Thanks, Pete --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---