Re: Java deserialization - any best practices for performances?
Hi Kenton, Thanks for your reply. > You can't continue to use a Builder after calling build(). Even if we made > it so you could, it would be building an entirely new object, not reusing > the old one. We can't make it reuse the old one because that would break > the immutability guarantee of message objects. Hmm... that strikes me as strange. I understand that the Message objects are immutable, but the Builders are as well? I thought that they would work more along the lines of String and StringBuilder, where String is obviously immutable and StringBuilder is mutable/ reusable. > But seriously, object allocation with a modern generational garbage > collector is extremely cheap, especially for objects that don't stick around > very long. So I don't think there's much to gain here. While I agree that object allocation is relatively cheap in Java, I have noticed that if you generate a lot of garbage, you have to also spend some time tweaking the garbage collector settings to avoid long/ frequent garbage collection pauses. I know that there has been a lot of recent work done in Java 7 (and experimentally in Java 6) to avoid this, but I haven't had the opportunity to test this yet. In fact, I find that often times this is the real difference in performance between Java and C++ in the cases where C++ seems to perform significantly faster... different object allocation practices (but more importantly, implementation/design choices). I don't know how well this holds true for a spectrum of different usage patterns, but my experience has been more from the large scale data processing side of things. And don't get me wrong, I'm actually one of the few people (out of my closest colleagues) who think that data processing can and should be done in Java over C++, but that's another discussion entirely :) But while we're on the subject, I have been looking for some rough benchmarks comparing the performance of Protocol Buffers in Java versus C++. Do you (the collective you) have any [rough] idea as to how they compare performance wise? I am thinking more in terms of batch-style processing (disk I/O, parsing centric) rather than RPC centric usage patterns. Any experiences you can share would be great. Thanks! --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Java deserialization - any best practices for performances?
On Thu, Jul 23, 2009 at 12:32 AM, alopecoid wrote: > > Hi, > > I haven't actually used the Java protobuf API, but it seems to me from > the quick occasional glance that this isn't entirely true. I mean, > specifically in response to the code snippet posted in the original > message, I would possibly: > > 1. Reuse the Builder object by calling its clear() method. This would > save from the need to create a new Builder object for each iteration > of the outermost loop. You can't continue to use a Builder after calling build(). Even if we made it so you could, it would be building an entirely new object, not reusing the old one. We can't make it reuse the old one because that would break the immutability guarantee of message objects. Reusing the actual builder object is not that useful since it's only a very small object containing a pointer to a message object. > 2. Iterate over the repeated field using the get*Count() and get* > (index) methods instead of the get*List() method. I'm not sure if this > would save anything, but depending on how things are implemented in > the generated code, this could save from allocating a new List object. Won't save anything; we still need a list object internally. But seriously, object allocation with a modern generational garbage collector is extremely cheap, especially for objects that don't stick around very long. So I don't think there's much to gain here. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Compiling on AIX 5.3 using xlC 3.55 compiler
It looks like unordered_map or unordered_set is not behaving correctly (e.g. not finding matching keys when they are present) but I can't really tell just from that output. On Mon, Jul 20, 2009 at 11:53 AM, vikram wrote: > > In previous attempt small test worked but was not able to work with > protocol buffer source code. > I tried one more thing as > #if deinfed (MISSING_HASH_MAP) && defined (__xlC__) > #define hash_map std::tr1::unordered_map > > google/protobuf/unittest.proto:85:37: Expected ";". > google/protobuf/unittest.proto:117:37: Expected ";". > google/protobuf/unittest.proto:133:55: Expected identifier. > google/protobuf/unittest.proto:134:55: Expected identifier. > google/protobuf/unittest.proto:135:55: Expected identifier. > google/protobuf/unittest.proto:136:55: Expected identifier. > google/protobuf/unittest.proto:137:54: Expected identifier. > google/protobuf/unittest.proto:138:55: Expected identifier. > google/protobuf/unittest.proto:139:55: Expected identifier. > google/protobuf/unittest.proto:140:55: Expected identifier. > google/protobuf/unittest.proto:141:55: Expected identifier. > google/protobuf/unittest.proto:142:54: Expected identifier. > google/protobuf/unittest.proto:143:55: Expected identifier. > google/protobuf/unittest.proto:144:55: Expected identifier. > google/protobuf/unittest.proto:146:54: Expected identifier. > google/protobuf/unittest.proto:147:54: Expected identifier. > google/protobuf/unittest.proto:154:73: Expected identifier. > google/protobuf/unittest.proto:155:57: Expected identifier. > google/protobuf/unittest.proto:192:47: Expected ";". > google/protobuf/unittest.proto:226:47: Expected ";". > google/protobuf/unittest.proto:244:65: Expected identifier. > google/protobuf/unittest.proto:245:65: Expected identifier. > google/protobuf/unittest.proto:246:65: Expected identifier. > google/protobuf/unittest.proto:247:65: Expected identifier. > google/protobuf/unittest.proto:248:64: Expected identifier. > google/protobuf/unittest.proto:249:65: Expected identifier. > google/protobuf/unittest.proto:250:65: Expected identifier. > google/protobuf/unittest.proto:251:65: Expected identifier. > google/protobuf/unittest.proto:252:65: Expected identifier. > google/protobuf/unittest.proto:253:64: Expected identifier. > google/protobuf/unittest.proto:254:65: Expected identifier. > google/protobuf/unittest.proto:255:65: Expected identifier. > google/protobuf/unittest.proto:257:64: Expected identifier. > google/protobuf/unittest.proto:258:64: Expected identifier. > google/protobuf/unittest.proto:268:64: Expected identifier. > google/protobuf/unittest.proto:269:68: Expected identifier. > google/protobuf/unittest.proto:276:42: Expected identifier. > google/protobuf/unittest.proto:379:26: Expected ";". > google/protobuf/unittest.proto:380:26: Expected ";". > google/protobuf/unittest.proto:451:47: Expected identifier. > google/protobuf/unittest.proto:452:47: Expected identifier. > google/protobuf/unittest.proto:453:47: Expected identifier. > google/protobuf/unittest.proto:454:47: Expected identifier. > google/protobuf/unittest.proto:455:47: Expected identifier. > google/protobuf/unittest.proto:460:46: Expected identifier. > gmake: *** [unittest_proto_middleman] Error 1 > > So I guess it wont even compile with unordered_map provided on Linux . > Please provide some inputs on this one. > > Thanks & Regards, > Vikram > > On Jul 14, 6:35 pm, Kenton Varda wrote: > > It looks like your implementation of hash_map is not working correctly -- > > all lookups are failing. You might try writing a little test for > hash_map > > itself that would be easier to debug. > > > > On Tue, Jul 14, 2009 at 6:27 PM, vikram wrote: > > > > > Kenton & Monty, > > > > >I added hack as followes in the hash.h > > > > > // File changed . > > > > > #if defined(HAVE_HASH_MAP) && defined(HAVE_HASH_SET) > > > #include HASH_MAP_H > > > #include HASH_SET_H > > > #elif defined (__xlC__) > > > #define MISSING_HASH > > > #include > > > #include > > > #else > > > #define MISSING_HASH > > > #include > > > #include > > > #endif > > > > > namespace google { > > > namespace protobuf { > > > #if defined(MISSING_HASH) && defined(__xlC__) > > > > > //@TODO > > > //Inherit hash_map from unordered_map > > > template > > > struct hash : public std::tr1::hash { > > > }; > > > > > template > > > struct hash { > > > inline size_t operator()(const Key* key) const { > > >return reinterpret_cast(key); > > > } > > > }; > > > > > template > > typename HashFcn = hash, > > > typename EqualKey = std::equal_to > > > > class hash_map : public std::tr1::unordered_map > > EqualKey> { > > > > > }; > > > > > template > > typename HashFcn = hash, > > > typename EqualKey = std::equal_to > > > > class hash_set : public std::tr1::unordered_set< > > >Key, HashFcn, EqualKey> { > > > }; > > > #elif defined(MISSING_HASH) > > > > > File continues as it is > > > > > Stack trace > > > > > pthrea
Re: Detecting message type prior parsing a ByteBuffer containing the message
On Jul 23, 12:54 am, Benedikt Hallinger wrote: > Hello (and sorry for bothering) and thank you for your clarifying. > > > The "type" is simply transmitted as a tag on the wire. The wire format > > of extensions is just like the wire format of a regular field - the > > difference is that the containing type (Packet) need not know about > > the extension fields in the .proto definition. Instead, each payload > > type that you might include will have a unique extension number. When > > you include one of the payload messages in your program, it will > > register its extensions of Packet. Then when a packet message is > > parsed, if it sees a tag that is not defined in the .proto but is a > > valid extension, it looks up the extension information and parses it > > appropriately. > > The idea is good and clear, but i have problems figuring out the code. > The huge problem is, that the PacketManager should be designed protocol > agnostic, so it > can deal with all protocols designed in the extension way you described. > > This said, if i understood it right, i would write it as following: > ---File: myProto.proto--- > message Packet { > extensions n to m; > > } > > // Payload messages > // the type field must be the same for all mesages, with differing tag > number > message TestMessage { > extend Packet { > optional TestMessage type = 10;} > > required string msg = 2; > > } > > - > > ---File: PacketHandler.java--- > class PacketHandler { > ... > > // Gets called whenever a new message arrives > // ByteBuffer is passed to this method containing > // the raw data from the net. > public void messageRecieved(ByteBuffer buffer) { > byte[] bytes = buffer.array(); // fetch bytes from buffer > > // parse the header data; > // this.header is passed in the constructor of PacketHandler; > // it is an instance of a "Packet" message. > ExtendableMessage header = > this.header.newBuilderForType().mergeFrom(bytes).build(); You need to provide the ExtensionRegistry to the builder when invoking mergeFrom() so that it knows how to parse the extended types. > > // lets see what tag number the extension has, this is the type identifier. > // this is done via the extension registry. > ExtensionRegistry extReg = ExtensionRegistry.newInstance(); > int type = extReg.findExtensionByName("type").descriptor().getNumber(); This doesn't access anything in the message that was transmitted - it just sees if there was an extension registered with the name type. Instead, you can use the reflection methods to determine which fields are set in the parsed message - maybe something like: Map field_map = header.getAllFields (); for (Map.Entry field : field_map.entrySet()) { HandleType(field.getKey().number(),// this is the extension number of the message that was transmitted over the wire field.getValue()); } > > // invoke the right handler for that message, it will take care of the > rest: > getHandler(type).handle(header.getExtension(type)); > > } > > ...} > > - > > However, as i tried that, it gave me errors i cant solve. > As long as i work with concrete classes and names, all is okay, but that is > not what i want here. What i want is to avoid that hardcoded "switch" > statement on types, so it is easier to add new message types without having > to fiddle around with much code. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Writing a router tha can route messages
Thanks. On 16 Juli, 00:43, Kenton Varda wrote: > On Wed, Jul 15, 2009 at 10:56 AM, jasonh wrote: > > Foo f = Foo.newBuilder().mergeFrom(...).setCookie(newCookieVal).build > > (); > > There's actually a shortcut for this: the toBuilder() method of the Message > interface returns a Builder that is pre-initialized as a copy of that > message. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Writing a router tha can route messages
Thanks Jason, Yes all messages would use the same tag number for the "cookie" field, and all messages must have that field, but that is all that the router should know, and it must be able to append data to that field. I think I have understood what to do now, and thanks for the tip about where to look when it comes to modifying messages. Kaj On 15 Juli, 19:56, jasonh wrote: > On Jul 15, 5:33 am, Kaj Bjurman wrote: > > > > > > > Hi, > > > I have just read the tutorials and the some threads in these forums, > > but there's one thing that I can't find an answer for. I'm currently > > using a router that is responsible to route different messages to > > different servers. The protocol that we are currently using is a very > > simple. Basically header:tag=value:tag=value: and so on. > > > The router looks for a field with certain name (e.g. cookie) and > > routes a message based on the information in the value of that tag, > > and also modifies that field. > > > Is it possible to write a similar router for PB? The router that we > > have shouldn't know about all message types. It just need to know that > > a field with a certain tag always should exist and that the field has > > a value. I don't parse the data of the wire into messages, do > > modifications and then write them back to the wire. > > > So, in short. How to do that using PB? > > I'm a little unclear about your requirement for arbitrary types. Do > your message types all define some common header values? The wire > format doesn't have a notion of tag names, simply tag numbers and > types. Therefore, unless all your messages define "cookie" to have the > same tag number, the router won't know what field to look for. > However, I think you can accomplish what you want by defining a header > message containing the fields your router needs to know about, and > include the payload as raw data that is interpreted by clients who > know what specific message type to read. > > You can make sure that the field exists by labeling it "required" in > the message definition - then AbstractMessage.isInitialized() will > fail if the field has not been set. > > > > > I'm also a bit curious on how to decode arbritrary messages, modify > > them and write tham back (even if I won't do that). How do you > > normally modify messages since they are immutable? Is there a way to > > get a builder for a certain message, and have the builder initialized > > so that it represents a copy of the original message? > > Yes, the Builder class is what you want here. You can initialize a > builder using the original message, or better yet directly from the > serialized encoding from the wire (assuming you don't care about the > original message). > > Foo f = Foo.newBuilder().mergeFrom(...).setCookie(newCookieVal).build > (); > > Hope that helps, > Jason > > > > > > > I'm using Java if that makes a difference > > > Thanks > > Kaj --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Detecting message type prior parsing a ByteBuffer containing the message
Hello (and sorry for bothering) and thank you for your clarifying. > The "type" is simply transmitted as a tag on the wire. The wire format > of extensions is just like the wire format of a regular field - the > difference is that the containing type (Packet) need not know about > the extension fields in the .proto definition. Instead, each payload > type that you might include will have a unique extension number. When > you include one of the payload messages in your program, it will > register its extensions of Packet. Then when a packet message is > parsed, if it sees a tag that is not defined in the .proto but is a > valid extension, it looks up the extension information and parses it > appropriately. The idea is good and clear, but i have problems figuring out the code. The huge problem is, that the PacketManager should be designed protocol agnostic, so it can deal with all protocols designed in the extension way you described. This said, if i understood it right, i would write it as following: ---File: myProto.proto--- message Packet { extensions n to m; } // Payload messages // the type field must be the same for all mesages, with differing tag number message TestMessage { extend Packet { optional TestMessage type = 10; } required string msg = 2; } - ---File: PacketHandler.java--- class PacketHandler { ... // Gets called whenever a new message arrives // ByteBuffer is passed to this method containing // the raw data from the net. public void messageRecieved(ByteBuffer buffer) { byte[] bytes = buffer.array(); // fetch bytes from buffer // parse the header data; // this.header is passed in the constructor of PacketHandler; // it is an instance of a "Packet" message. ExtendableMessage header = this.header.newBuilderForType().mergeFrom(bytes).build(); // lets see what tag number the extension has, this is the type identifier. // this is done via the extension registry. ExtensionRegistry extReg = ExtensionRegistry.newInstance(); int type = extReg.findExtensionByName("type").descriptor().getNumber(); // invoke the right handler for that message, it will take care of the rest: getHandler(type).handle(header.getExtension(type)); } ... } - However, as i tried that, it gave me errors i cant solve. As long as i work with concrete classes and names, all is okay, but that is not what i want here. What i want is to avoid that hardcoded "switch" statement on types, so it is easier to add new message types without having to fiddle around with much code. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Java deserialization - any best practices for performances?
Hi, I haven't actually used the Java protobuf API, but it seems to me from the quick occasional glance that this isn't entirely true. I mean, specifically in response to the code snippet posted in the original message, I would possibly: 1. Reuse the Builder object by calling its clear() method. This would save from the need to create a new Builder object for each iteration of the outermost loop. 2. Iterate over the repeated field using the get*Count() and get* (index) methods instead of the get*List() method. I'm not sure if this would save anything, but depending on how things are implemented in the generated code, this could save from allocating a new List object. Also, might "bytes" type fields perform better than any "string" type fields that you may have in your particular data set? I'm not sure, but it might be worth benchmarking. On Jul 18, 9:22 pm, Kenton Varda wrote: > On Fri, Jul 17, 2009 at 8:13 PM, Alex Black wrote: > > > When I write out messages using C++ I'm careful to clear messages and > > re-use them, is there something equivalent on the java side when > > reading those same messages in? > > No. Sorry. This just doesn't fit at all with the Java library's design, > and even if it did, you cannot reuse Java String objects, which often > account for most of the memory usage. However, memory allocation is cheaper > in Java than in C++, so there's less to gain from it. > > > > > My code looks like: > > > CodedInputStream stream = CodedInputStream.newInstance(inputStream); > > > while ( !stream.isAtEnd() ) > > { > > MyMessage.Builder builder = MyMessage.newBuilder(); > > stream.readMessage(builder, null); > > MyMessage myMessage = builder.build(); > > > for ( MessageValue messageValue : myMessage.getValuesList() ) > > { > > .. > > } > > } > > > I'm passing 150 messages each with 1000 items, so presumably memory is > > allocated 150 times for each of the messages... > > > - Alex --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---