Re: Java deserialization - any best practices for performances?

2009-07-24 Thread Kenton Varda
On Thu, Jul 23, 2009 at 7:15 PM, alopecoid alopec...@gmail.com wrote:

 Hmm... that strikes me as strange. I understand that the Message
 objects are immutable, but the Builders are as well? I thought that
 they would work more along the lines of String and StringBuilder,
 where String is obviously immutable and StringBuilder is mutable/
 reusable.


The point is that it's the Message object that contains all the stuff
allocated by the Builder, and therefore none of that stuff can actually be
reused.  (When you call build(), nothing is copied -- it just returns the
object that it has been working on.)  So reusing the builder itself is kind
of useless, because it's just a trivial object containing one pointer (to
the message object it is working on constructing).


 But while we're on the subject, I have been looking for some rough
 benchmarks comparing the performance of Protocol Buffers in Java
 versus C++. Do you (the collective you) have any [rough] idea as to
 how they compare performance wise? I am thinking more in terms of
 batch-style processing (disk I/O, parsing centric) rather than RPC
 centric usage patterns. Any experiences you can share would be great.


I have some benchmarks that IIRC show that Java parsing and serialization is
roughly half the speed of C++.  As I recall a lot of the speed difference is
from UTF-8 decoding/encoding -- in C++ we just leave the bytes encoded, but
in Java we need to decode them in order to construct standard String
objects.

I've been planning to release these benchmarks publicly but it will take
some work and there's a lot of higher-priority stuff to do.  :/  (I think
Jon Skeet did get the Java side of the benchmarks into SVN but there's no
C++ equivalent yet.)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: java string literal too long when initializing java.lang.String descriptorData

2009-07-24 Thread Kenton Varda
How annoying.  I'll make sure this or something like it gets into the next
release -- which I'm going to try to push next week.

On Wed, Jul 22, 2009 at 8:36 AM, anonymous eric.pe...@hp.com wrote:


 Hello,

 I was not able to compile a Java file generated by protoc 2.10 from a
 rather big .proto file.
 It seems I have hit the upper limit for a Java string literal
 (65535???).

 I slightly modified src/google/protobuf/compiler/java/java_file.cc so
 that static initialization is performed from
 an array of literal strings in the case CEscape(file_data).size() 
 65535.

 Is this a real problem, or am I missing something ?

 Here is the patch:

 diff -r -u protobuf-2.1.0/src/google/protobuf/compiler/java/
 java_file.cc protobuf-2.1.0.new/src/google/protobuf/compiler/java/
 java_file.cc
 --- protobuf-2.1.0/src/google/protobuf/compiler/java/java_file.cc
 2009-05-13 16:36:30.0 -0400
 +++ protobuf-2.1.0.new/src/google/protobuf/compiler/java/java_file.cc
 2009-07-22 10:37:28.0 -0400
 @@ -207,6 +207,9 @@
   // This makes huge bytecode files and can easily hit the compiler's
 internal
   // code size limits (error code to large).  String literals are
 apparently
   // embedded raw, which is what we want.
 +  // In the case the FileDescriptorProto is too big for fitting into
 a string
 +  // literal, first creating ain array of string literals, then
 concatenating
 +  // them into the final FileDescriptorProto string.
   FileDescriptorProto file_proto;
   file_-CopyTo(file_proto);
   string file_data;
 @@ -218,22 +221,51 @@
   return descriptor;\n
 }\n
 private static com.google.protobuf.Descriptors.FileDescriptor\n
 -descriptor;\n
 -static {\n
 -  java.lang.String descriptorData =\n);
 -  printer-Indent();
 -  printer-Indent();
 +descriptor;\n);

 -  // Only write 40 bytes per line.
   static const int kBytesPerLine = 40;
 -  for (int i = 0; i  file_data.size(); i += kBytesPerLine) {
 -if (i  0) printer-Print( +\n);
 -printer-Print(\$data$\,
 -  data, CEscape(file_data.substr(i, kBytesPerLine)));
 -  }
 -  printer-Print(;\n);

 -  printer-Outdent();
 +  // Limit for a Java literal string is 65535
 +  bool stringTooLong = (CEscape(file_data).size()  65535);
 +
 +  if (stringTooLong) {
 +printer-Print(static {\n
 +java.lang.String descriptorDataArray[] = {\n);
 +printer-Indent();
 +printer-Indent();
 +
 +// Only write 40 bytes per line.
 +for (int i = 0; i  file_data.size(); i += kBytesPerLine) {
 +  if (i  0) printer-Print(,\n);
 +  printer-Print(\$data$\,
 +data, CEscape(file_data.substr(i, kBytesPerLine)));
 +}
 +printer-Outdent();
 +printer-Print(\n
 +};\n\n);
 +printer-Print(java.lang.String descriptorData = \\;\n);
 +printer-Print(for (String data : descriptorDataArray) {\n);
 +printer-Indent();
 +printer-Print(descriptorData += data;\n);
 +printer-Outdent();
 +printer-Print(}\n\n);
 +  } else {
 +printer-Print(static {\n
 +java.lang.String descriptorData =\n);
 +printer-Indent();
 +printer-Indent();
 +
 +// Only write 40 bytes per line.
 +static const int kBytesPerLine = 40;
 +for (int i = 0; i  file_data.size(); i += kBytesPerLine) {
 +  if (i  0) printer-Print( +\n);
 +printer-Print(\$data$\,
 +data, CEscape(file_data.substr(i, kBytesPerLine)));
 +  }
 +printer-Print(;\n);
 +
 +printer-Outdent();
 +  }

   //
 -
   // Create the InternalDescriptorAssigner.

 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: java string literal too long when initializing java.lang.String descriptorData

2009-07-24 Thread Christopher Smith

Heh, I remember complaining about this to James when they first
published the byte code spec. It was annoying then and continues to be
annoying today.


On 7/24/09, Kenton Varda ken...@google.com wrote:
 How annoying.  I'll make sure this or something like it gets into the next
 release -- which I'm going to try to push next week.

 On Wed, Jul 22, 2009 at 8:36 AM, anonymous eric.pe...@hp.com wrote:


 Hello,

 I was not able to compile a Java file generated by protoc 2.10 from a
 rather big .proto file.
 It seems I have hit the upper limit for a Java string literal
 (65535???).

 I slightly modified src/google/protobuf/compiler/java/java_file.cc so
 that static initialization is performed from
 an array of literal strings in the case CEscape(file_data).size() 
 65535.

 Is this a real problem, or am I missing something ?

 Here is the patch:

 diff -r -u protobuf-2.1.0/src/google/protobuf/compiler/java/
 java_file.cc protobuf-2.1.0.new/src/google/protobuf/compiler/java/
 java_file.cc
 --- protobuf-2.1.0/src/google/protobuf/compiler/java/java_file.cc
 2009-05-13 16:36:30.0 -0400
 +++ protobuf-2.1.0.new/src/google/protobuf/compiler/java/java_file.cc
 2009-07-22 10:37:28.0 -0400
 @@ -207,6 +207,9 @@
   // This makes huge bytecode files and can easily hit the compiler's
 internal
   // code size limits (error code to large).  String literals are
 apparently
   // embedded raw, which is what we want.
 +  // In the case the FileDescriptorProto is too big for fitting into
 a string
 +  // literal, first creating ain array of string literals, then
 concatenating
 +  // them into the final FileDescriptorProto string.
   FileDescriptorProto file_proto;
   file_-CopyTo(file_proto);
   string file_data;
 @@ -218,22 +221,51 @@
   return descriptor;\n
 }\n
 private static com.google.protobuf.Descriptors.FileDescriptor\n
 -descriptor;\n
 -static {\n
 -  java.lang.String descriptorData =\n);
 -  printer-Indent();
 -  printer-Indent();
 +descriptor;\n);

 -  // Only write 40 bytes per line.
   static const int kBytesPerLine = 40;
 -  for (int i = 0; i  file_data.size(); i += kBytesPerLine) {
 -if (i  0) printer-Print( +\n);
 -printer-Print(\$data$\,
 -  data, CEscape(file_data.substr(i, kBytesPerLine)));
 -  }
 -  printer-Print(;\n);

 -  printer-Outdent();
 +  // Limit for a Java literal string is 65535
 +  bool stringTooLong = (CEscape(file_data).size()  65535);
 +
 +  if (stringTooLong) {
 +printer-Print(static {\n
 +java.lang.String descriptorDataArray[] = {\n);
 +printer-Indent();
 +printer-Indent();
 +
 +// Only write 40 bytes per line.
 +for (int i = 0; i  file_data.size(); i += kBytesPerLine) {
 +  if (i  0) printer-Print(,\n);
 +  printer-Print(\$data$\,
 +data, CEscape(file_data.substr(i, kBytesPerLine)));
 +}
 +printer-Outdent();
 +printer-Print(\n
 +};\n\n);
 +printer-Print(java.lang.String descriptorData = \\;\n);
 +printer-Print(for (String data : descriptorDataArray) {\n);
 +printer-Indent();
 +printer-Print(descriptorData += data;\n);
 +printer-Outdent();
 +printer-Print(}\n\n);
 +  } else {
 +printer-Print(static {\n
 +java.lang.String descriptorData =\n);
 +printer-Indent();
 +printer-Indent();
 +
 +// Only write 40 bytes per line.
 +static const int kBytesPerLine = 40;
 +for (int i = 0; i  file_data.size(); i += kBytesPerLine) {
 +  if (i  0) printer-Print( +\n);
 +printer-Print(\$data$\,
 +data, CEscape(file_data.substr(i, kBytesPerLine)));
 +  }
 +printer-Print(;\n);
 +
 +printer-Outdent();
 +  }

   //
 -
   // Create the InternalDescriptorAssigner.

 


 


-- 
Sent from my mobile device

Chris

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Java deserialization - any best practices for performances?

2009-07-24 Thread Christopher Smith

The best way to think of it is:

Builder : Java Message :: C++ Message : const C++ Message

As far as performance goes, it is a common mistake to confuse C/C++
heap memory allocation costs to Java heap allocation. In the common
case, allocations in Java are just a few instructions... comperable to
stack allocations in C/C++. What normally gets you in Java is the
initialization cost, and in this particlar scenario there is no way
around that.

If you are worried, you could benchmark the difference between
constantly allocating builders as you go vs. starting with an array of
N builders (allocating the array would be done outside of the
benchmark). I am sure it will prove enlightening.


On 7/24/09, Kenton Varda ken...@google.com wrote:
 On Thu, Jul 23, 2009 at 7:15 PM, alopecoid alopec...@gmail.com wrote:

 Hmm... that strikes me as strange. I understand that the Message
 objects are immutable, but the Builders are as well? I thought that
 they would work more along the lines of String and StringBuilder,
 where String is obviously immutable and StringBuilder is mutable/
 reusable.


 The point is that it's the Message object that contains all the stuff
 allocated by the Builder, and therefore none of that stuff can actually be
 reused.  (When you call build(), nothing is copied -- it just returns the
 object that it has been working on.)  So reusing the builder itself is kind
 of useless, because it's just a trivial object containing one pointer (to
 the message object it is working on constructing).


 But while we're on the subject, I have been looking for some rough
 benchmarks comparing the performance of Protocol Buffers in Java
 versus C++. Do you (the collective you) have any [rough] idea as to
 how they compare performance wise? I am thinking more in terms of
 batch-style processing (disk I/O, parsing centric) rather than RPC
 centric usage patterns. Any experiences you can share would be great.


 I have some benchmarks that IIRC show that Java parsing and serialization is
 roughly half the speed of C++.  As I recall a lot of the speed difference is
 from UTF-8 decoding/encoding -- in C++ we just leave the bytes encoded, but
 in Java we need to decode them in order to construct standard String
 objects.

 I've been planning to release these benchmarks publicly but it will take
 some work and there's a lot of higher-priority stuff to do.  :/  (I think
 Jon Skeet did get the Java side of the benchmarks into SVN but there's no
 C++ equivalent yet.)

 


-- 
Sent from my mobile device

Chris

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---