Re: Java deserialization - any best practices for performances?
On Thu, Jul 23, 2009 at 7:15 PM, alopecoid alopec...@gmail.com wrote: Hmm... that strikes me as strange. I understand that the Message objects are immutable, but the Builders are as well? I thought that they would work more along the lines of String and StringBuilder, where String is obviously immutable and StringBuilder is mutable/ reusable. The point is that it's the Message object that contains all the stuff allocated by the Builder, and therefore none of that stuff can actually be reused. (When you call build(), nothing is copied -- it just returns the object that it has been working on.) So reusing the builder itself is kind of useless, because it's just a trivial object containing one pointer (to the message object it is working on constructing). But while we're on the subject, I have been looking for some rough benchmarks comparing the performance of Protocol Buffers in Java versus C++. Do you (the collective you) have any [rough] idea as to how they compare performance wise? I am thinking more in terms of batch-style processing (disk I/O, parsing centric) rather than RPC centric usage patterns. Any experiences you can share would be great. I have some benchmarks that IIRC show that Java parsing and serialization is roughly half the speed of C++. As I recall a lot of the speed difference is from UTF-8 decoding/encoding -- in C++ we just leave the bytes encoded, but in Java we need to decode them in order to construct standard String objects. I've been planning to release these benchmarks publicly but it will take some work and there's a lot of higher-priority stuff to do. :/ (I think Jon Skeet did get the Java side of the benchmarks into SVN but there's no C++ equivalent yet.) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: java string literal too long when initializing java.lang.String descriptorData
How annoying. I'll make sure this or something like it gets into the next release -- which I'm going to try to push next week. On Wed, Jul 22, 2009 at 8:36 AM, anonymous eric.pe...@hp.com wrote: Hello, I was not able to compile a Java file generated by protoc 2.10 from a rather big .proto file. It seems I have hit the upper limit for a Java string literal (65535???). I slightly modified src/google/protobuf/compiler/java/java_file.cc so that static initialization is performed from an array of literal strings in the case CEscape(file_data).size() 65535. Is this a real problem, or am I missing something ? Here is the patch: diff -r -u protobuf-2.1.0/src/google/protobuf/compiler/java/ java_file.cc protobuf-2.1.0.new/src/google/protobuf/compiler/java/ java_file.cc --- protobuf-2.1.0/src/google/protobuf/compiler/java/java_file.cc 2009-05-13 16:36:30.0 -0400 +++ protobuf-2.1.0.new/src/google/protobuf/compiler/java/java_file.cc 2009-07-22 10:37:28.0 -0400 @@ -207,6 +207,9 @@ // This makes huge bytecode files and can easily hit the compiler's internal // code size limits (error code to large). String literals are apparently // embedded raw, which is what we want. + // In the case the FileDescriptorProto is too big for fitting into a string + // literal, first creating ain array of string literals, then concatenating + // them into the final FileDescriptorProto string. FileDescriptorProto file_proto; file_-CopyTo(file_proto); string file_data; @@ -218,22 +221,51 @@ return descriptor;\n }\n private static com.google.protobuf.Descriptors.FileDescriptor\n -descriptor;\n -static {\n - java.lang.String descriptorData =\n); - printer-Indent(); - printer-Indent(); +descriptor;\n); - // Only write 40 bytes per line. static const int kBytesPerLine = 40; - for (int i = 0; i file_data.size(); i += kBytesPerLine) { -if (i 0) printer-Print( +\n); -printer-Print(\$data$\, - data, CEscape(file_data.substr(i, kBytesPerLine))); - } - printer-Print(;\n); - printer-Outdent(); + // Limit for a Java literal string is 65535 + bool stringTooLong = (CEscape(file_data).size() 65535); + + if (stringTooLong) { +printer-Print(static {\n +java.lang.String descriptorDataArray[] = {\n); +printer-Indent(); +printer-Indent(); + +// Only write 40 bytes per line. +for (int i = 0; i file_data.size(); i += kBytesPerLine) { + if (i 0) printer-Print(,\n); + printer-Print(\$data$\, +data, CEscape(file_data.substr(i, kBytesPerLine))); +} +printer-Outdent(); +printer-Print(\n +};\n\n); +printer-Print(java.lang.String descriptorData = \\;\n); +printer-Print(for (String data : descriptorDataArray) {\n); +printer-Indent(); +printer-Print(descriptorData += data;\n); +printer-Outdent(); +printer-Print(}\n\n); + } else { +printer-Print(static {\n +java.lang.String descriptorData =\n); +printer-Indent(); +printer-Indent(); + +// Only write 40 bytes per line. +static const int kBytesPerLine = 40; +for (int i = 0; i file_data.size(); i += kBytesPerLine) { + if (i 0) printer-Print( +\n); +printer-Print(\$data$\, +data, CEscape(file_data.substr(i, kBytesPerLine))); + } +printer-Print(;\n); + +printer-Outdent(); + } // - // Create the InternalDescriptorAssigner. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: java string literal too long when initializing java.lang.String descriptorData
Heh, I remember complaining about this to James when they first published the byte code spec. It was annoying then and continues to be annoying today. On 7/24/09, Kenton Varda ken...@google.com wrote: How annoying. I'll make sure this or something like it gets into the next release -- which I'm going to try to push next week. On Wed, Jul 22, 2009 at 8:36 AM, anonymous eric.pe...@hp.com wrote: Hello, I was not able to compile a Java file generated by protoc 2.10 from a rather big .proto file. It seems I have hit the upper limit for a Java string literal (65535???). I slightly modified src/google/protobuf/compiler/java/java_file.cc so that static initialization is performed from an array of literal strings in the case CEscape(file_data).size() 65535. Is this a real problem, or am I missing something ? Here is the patch: diff -r -u protobuf-2.1.0/src/google/protobuf/compiler/java/ java_file.cc protobuf-2.1.0.new/src/google/protobuf/compiler/java/ java_file.cc --- protobuf-2.1.0/src/google/protobuf/compiler/java/java_file.cc 2009-05-13 16:36:30.0 -0400 +++ protobuf-2.1.0.new/src/google/protobuf/compiler/java/java_file.cc 2009-07-22 10:37:28.0 -0400 @@ -207,6 +207,9 @@ // This makes huge bytecode files and can easily hit the compiler's internal // code size limits (error code to large). String literals are apparently // embedded raw, which is what we want. + // In the case the FileDescriptorProto is too big for fitting into a string + // literal, first creating ain array of string literals, then concatenating + // them into the final FileDescriptorProto string. FileDescriptorProto file_proto; file_-CopyTo(file_proto); string file_data; @@ -218,22 +221,51 @@ return descriptor;\n }\n private static com.google.protobuf.Descriptors.FileDescriptor\n -descriptor;\n -static {\n - java.lang.String descriptorData =\n); - printer-Indent(); - printer-Indent(); +descriptor;\n); - // Only write 40 bytes per line. static const int kBytesPerLine = 40; - for (int i = 0; i file_data.size(); i += kBytesPerLine) { -if (i 0) printer-Print( +\n); -printer-Print(\$data$\, - data, CEscape(file_data.substr(i, kBytesPerLine))); - } - printer-Print(;\n); - printer-Outdent(); + // Limit for a Java literal string is 65535 + bool stringTooLong = (CEscape(file_data).size() 65535); + + if (stringTooLong) { +printer-Print(static {\n +java.lang.String descriptorDataArray[] = {\n); +printer-Indent(); +printer-Indent(); + +// Only write 40 bytes per line. +for (int i = 0; i file_data.size(); i += kBytesPerLine) { + if (i 0) printer-Print(,\n); + printer-Print(\$data$\, +data, CEscape(file_data.substr(i, kBytesPerLine))); +} +printer-Outdent(); +printer-Print(\n +};\n\n); +printer-Print(java.lang.String descriptorData = \\;\n); +printer-Print(for (String data : descriptorDataArray) {\n); +printer-Indent(); +printer-Print(descriptorData += data;\n); +printer-Outdent(); +printer-Print(}\n\n); + } else { +printer-Print(static {\n +java.lang.String descriptorData =\n); +printer-Indent(); +printer-Indent(); + +// Only write 40 bytes per line. +static const int kBytesPerLine = 40; +for (int i = 0; i file_data.size(); i += kBytesPerLine) { + if (i 0) printer-Print( +\n); +printer-Print(\$data$\, +data, CEscape(file_data.substr(i, kBytesPerLine))); + } +printer-Print(;\n); + +printer-Outdent(); + } // - // Create the InternalDescriptorAssigner. -- Sent from my mobile device Chris --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Java deserialization - any best practices for performances?
The best way to think of it is: Builder : Java Message :: C++ Message : const C++ Message As far as performance goes, it is a common mistake to confuse C/C++ heap memory allocation costs to Java heap allocation. In the common case, allocations in Java are just a few instructions... comperable to stack allocations in C/C++. What normally gets you in Java is the initialization cost, and in this particlar scenario there is no way around that. If you are worried, you could benchmark the difference between constantly allocating builders as you go vs. starting with an array of N builders (allocating the array would be done outside of the benchmark). I am sure it will prove enlightening. On 7/24/09, Kenton Varda ken...@google.com wrote: On Thu, Jul 23, 2009 at 7:15 PM, alopecoid alopec...@gmail.com wrote: Hmm... that strikes me as strange. I understand that the Message objects are immutable, but the Builders are as well? I thought that they would work more along the lines of String and StringBuilder, where String is obviously immutable and StringBuilder is mutable/ reusable. The point is that it's the Message object that contains all the stuff allocated by the Builder, and therefore none of that stuff can actually be reused. (When you call build(), nothing is copied -- it just returns the object that it has been working on.) So reusing the builder itself is kind of useless, because it's just a trivial object containing one pointer (to the message object it is working on constructing). But while we're on the subject, I have been looking for some rough benchmarks comparing the performance of Protocol Buffers in Java versus C++. Do you (the collective you) have any [rough] idea as to how they compare performance wise? I am thinking more in terms of batch-style processing (disk I/O, parsing centric) rather than RPC centric usage patterns. Any experiences you can share would be great. I have some benchmarks that IIRC show that Java parsing and serialization is roughly half the speed of C++. As I recall a lot of the speed difference is from UTF-8 decoding/encoding -- in C++ we just leave the bytes encoded, but in Java we need to decode them in order to construct standard String objects. I've been planning to release these benchmarks publicly but it will take some work and there's a lot of higher-priority stuff to do. :/ (I think Jon Skeet did get the Java side of the benchmarks into SVN but there's no C++ equivalent yet.) -- Sent from my mobile device Chris --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---