Re: GzipOutputStream is slower than gziping the file by hand

2009-07-17 Thread Peter Keen
Here is the patch. I included parameters for both the compression
level and the compression strategy, both of them defaulting to the
same thing the gzip binary uses.

--Pete

On Wed, Jul 15, 2009 at 4:37 PM, Kenton Varda wrote:
> OK.  Make sure that the parameter is optional, with the default matching
> gzip's default.  Thanks.
>
> On Wed, Jul 15, 2009 at 4:27 PM, Peter Keen  wrote:
>>
>> It doesn't look like it has an existing interface for setting zlib
>> options. Reading through the source[1], it looks like it chooses
>> Z_BEST_COMPRESSION (-9 to the gzip command line program) whereas gzip
>> defaults to -6. I'll work up a patch to pass the compression value
>> through as another option to the constructor.
>>
>> --Pete
>>
>> On Jul 15, 4:09 pm, Kenton Varda  wrote:
>> > Hmm, probably GzipOutputStream is not setting the same compression
>> > parameters as gzip itself uses by default.  I'm happy to accept a patch
>> > fixing this.  Does the interface (to GzipOutputStream) currently have a
>> > way
>> > to control compression parameters?  If not, it probably should.
>> >
>> >
>> >
>> > On Wed, Jul 15, 2009 at 3:50 PM, Peter Keen 
>> > wrote:
>> >
>> > > Hi guys,
>> >
>> > > I'm playing around with protocol buffers for a project at work and I'm
>> > > coming across a possibly weird problem. I have the following setup in
>> > > my main():
>> >
>> > >    std::cerr << "creating file" << std::endl;
>> > >    int fd = open("blah.repo", O_WRONLY, O_CREAT);
>> > >    if ( fd == -1 ) {
>> > >        std::cerr << "ERROR: " << errno << " " << strerror(errno) <<
>> > > std::endl;
>> > >        return 1;
>> > >    }
>> >
>> > >    ZeroCopyOutputStream* raw_output = new FileOutputStream(fd);
>> > >    GzipOutputStream* gzip_output = new GzipOutputStream(raw_output,
>> > > GzipOutputStream::ZLIB);
>> > >    CodedOutputStream* coded_output = new CodedOutputStream
>> > > (gzip_output);
>> > >    // CodedOutputStream* coded_output = new CodedOutputStream
>> > > (raw_output);
>> >
>> > > This version takes, say, 8 seconds to create and serialize 100k simple
>> > > messages. If I flip it to not use the GzipOutputStream, it takes
>> > > roughly 1 second. Using gzip(1) to compress the resulting file takes
>> > > less than half a second.
>> >
>> > > Is there an option I need to be setting to bring it up to parity with
>> > > the command-line program or could there be a bug in GzipOutputStream?
>> > > For what it's worth, GzipInputStream is roughly on parity with a raw
>> > > CodedOutputStream.
>> >
>> > > Thanks,
>> > > Pete
>> >>
>
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



gzip-compression-level.patch
Description: Binary data


Re: GzipOutputStream is slower than gziping the file by hand

2009-07-15 Thread Kenton Varda
OK.  Make sure that the parameter is optional, with the default matching
gzip's default.  Thanks.

On Wed, Jul 15, 2009 at 4:27 PM, Peter Keen  wrote:

>
> It doesn't look like it has an existing interface for setting zlib
> options. Reading through the source[1], it looks like it chooses
> Z_BEST_COMPRESSION (-9 to the gzip command line program) whereas gzip
> defaults to -6. I'll work up a patch to pass the compression value
> through as another option to the constructor.
>
> --Pete
>
> On Jul 15, 4:09 pm, Kenton Varda  wrote:
> > Hmm, probably GzipOutputStream is not setting the same compression
> > parameters as gzip itself uses by default.  I'm happy to accept a patch
> > fixing this.  Does the interface (to GzipOutputStream) currently have a
> way
> > to control compression parameters?  If not, it probably should.
> >
> >
> >
> > On Wed, Jul 15, 2009 at 3:50 PM, Peter Keen 
> wrote:
> >
> > > Hi guys,
> >
> > > I'm playing around with protocol buffers for a project at work and I'm
> > > coming across a possibly weird problem. I have the following setup in
> > > my main():
> >
> > >std::cerr << "creating file" << std::endl;
> > >int fd = open("blah.repo", O_WRONLY, O_CREAT);
> > >if ( fd == -1 ) {
> > >std::cerr << "ERROR: " << errno << " " << strerror(errno) <<
> > > std::endl;
> > >return 1;
> > >}
> >
> > >ZeroCopyOutputStream* raw_output = new FileOutputStream(fd);
> > >GzipOutputStream* gzip_output = new GzipOutputStream(raw_output,
> > > GzipOutputStream::ZLIB);
> > >CodedOutputStream* coded_output = new CodedOutputStream
> > > (gzip_output);
> > >// CodedOutputStream* coded_output = new CodedOutputStream
> > > (raw_output);
> >
> > > This version takes, say, 8 seconds to create and serialize 100k simple
> > > messages. If I flip it to not use the GzipOutputStream, it takes
> > > roughly 1 second. Using gzip(1) to compress the resulting file takes
> > > less than half a second.
> >
> > > Is there an option I need to be setting to bring it up to parity with
> > > the command-line program or could there be a bug in GzipOutputStream?
> > > For what it's worth, GzipInputStream is roughly on parity with a raw
> > > CodedOutputStream.
> >
> > > Thanks,
> > > Pete
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: GzipOutputStream is slower than gziping the file by hand

2009-07-15 Thread Peter Keen

It doesn't look like it has an existing interface for setting zlib
options. Reading through the source[1], it looks like it chooses
Z_BEST_COMPRESSION (-9 to the gzip command line program) whereas gzip
defaults to -6. I'll work up a patch to pass the compression value
through as another option to the constructor.

--Pete

On Jul 15, 4:09 pm, Kenton Varda  wrote:
> Hmm, probably GzipOutputStream is not setting the same compression
> parameters as gzip itself uses by default.  I'm happy to accept a patch
> fixing this.  Does the interface (to GzipOutputStream) currently have a way
> to control compression parameters?  If not, it probably should.
>
>
>
> On Wed, Jul 15, 2009 at 3:50 PM, Peter Keen  wrote:
>
> > Hi guys,
>
> > I'm playing around with protocol buffers for a project at work and I'm
> > coming across a possibly weird problem. I have the following setup in
> > my main():
>
> >    std::cerr << "creating file" << std::endl;
> >    int fd = open("blah.repo", O_WRONLY, O_CREAT);
> >    if ( fd == -1 ) {
> >        std::cerr << "ERROR: " << errno << " " << strerror(errno) <<
> > std::endl;
> >        return 1;
> >    }
>
> >    ZeroCopyOutputStream* raw_output = new FileOutputStream(fd);
> >    GzipOutputStream* gzip_output = new GzipOutputStream(raw_output,
> > GzipOutputStream::ZLIB);
> >    CodedOutputStream* coded_output = new CodedOutputStream
> > (gzip_output);
> >    // CodedOutputStream* coded_output = new CodedOutputStream
> > (raw_output);
>
> > This version takes, say, 8 seconds to create and serialize 100k simple
> > messages. If I flip it to not use the GzipOutputStream, it takes
> > roughly 1 second. Using gzip(1) to compress the resulting file takes
> > less than half a second.
>
> > Is there an option I need to be setting to bring it up to parity with
> > the command-line program or could there be a bug in GzipOutputStream?
> > For what it's worth, GzipInputStream is roughly on parity with a raw
> > CodedOutputStream.
>
> > Thanks,
> > Pete
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: GzipOutputStream is slower than gziping the file by hand

2009-07-15 Thread Kenton Varda
Hmm, probably GzipOutputStream is not setting the same compression
parameters as gzip itself uses by default.  I'm happy to accept a patch
fixing this.  Does the interface (to GzipOutputStream) currently have a way
to control compression parameters?  If not, it probably should.

On Wed, Jul 15, 2009 at 3:50 PM, Peter Keen  wrote:

>
> Hi guys,
>
> I'm playing around with protocol buffers for a project at work and I'm
> coming across a possibly weird problem. I have the following setup in
> my main():
>
>std::cerr << "creating file" << std::endl;
>int fd = open("blah.repo", O_WRONLY, O_CREAT);
>if ( fd == -1 ) {
>std::cerr << "ERROR: " << errno << " " << strerror(errno) <<
> std::endl;
>return 1;
>}
>
>ZeroCopyOutputStream* raw_output = new FileOutputStream(fd);
>GzipOutputStream* gzip_output = new GzipOutputStream(raw_output,
> GzipOutputStream::ZLIB);
>CodedOutputStream* coded_output = new CodedOutputStream
> (gzip_output);
>// CodedOutputStream* coded_output = new CodedOutputStream
> (raw_output);
>
> This version takes, say, 8 seconds to create and serialize 100k simple
> messages. If I flip it to not use the GzipOutputStream, it takes
> roughly 1 second. Using gzip(1) to compress the resulting file takes
> less than half a second.
>
> Is there an option I need to be setting to bring it up to parity with
> the command-line program or could there be a bug in GzipOutputStream?
> For what it's worth, GzipInputStream is roughly on parity with a raw
> CodedOutputStream.
>
> Thanks,
> Pete
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



GzipOutputStream is slower than gziping the file by hand

2009-07-15 Thread Peter Keen

Hi guys,

I'm playing around with protocol buffers for a project at work and I'm
coming across a possibly weird problem. I have the following setup in
my main():

std::cerr << "creating file" << std::endl;
int fd = open("blah.repo", O_WRONLY, O_CREAT);
if ( fd == -1 ) {
std::cerr << "ERROR: " << errno << " " << strerror(errno) <<
std::endl;
return 1;
}

ZeroCopyOutputStream* raw_output = new FileOutputStream(fd);
GzipOutputStream* gzip_output = new GzipOutputStream(raw_output,
GzipOutputStream::ZLIB);
CodedOutputStream* coded_output = new CodedOutputStream
(gzip_output);
// CodedOutputStream* coded_output = new CodedOutputStream
(raw_output);

This version takes, say, 8 seconds to create and serialize 100k simple
messages. If I flip it to not use the GzipOutputStream, it takes
roughly 1 second. Using gzip(1) to compress the resulting file takes
less than half a second.

Is there an option I need to be setting to bring it up to parity with
the command-line program or could there be a bug in GzipOutputStream?
For what it's worth, GzipInputStream is roughly on parity with a raw
CodedOutputStream.

Thanks,
Pete

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---