Re: Java deserialization - any best practices for performances?

2009-07-23 Thread alopecoid

Hi Kenton,

Thanks for your reply.

> You can't continue to use a Builder after calling build().  Even if we made
> it so you could, it would be building an entirely new object, not reusing
> the old one.  We can't make it reuse the old one because that would break
> the immutability guarantee of message objects.

Hmm... that strikes me as strange. I understand that the Message
objects are immutable, but the Builders are as well? I thought that
they would work more along the lines of String and StringBuilder,
where String is obviously immutable and StringBuilder is mutable/
reusable.

> But seriously, object allocation with a modern generational garbage
> collector is extremely cheap, especially for objects that don't stick around
> very long.  So I don't think there's much to gain here.

While I agree that object allocation is relatively cheap in Java, I
have noticed that if you generate a lot of garbage, you have to also
spend some time tweaking the garbage collector settings to avoid long/
frequent garbage collection pauses. I know that there has been a lot
of recent work done in Java 7 (and experimentally in Java 6) to avoid
this, but I haven't had the opportunity to test this yet. In fact, I
find that often times this is the real difference in performance
between Java and C++ in the cases where C++ seems to perform
significantly faster... different object allocation practices (but
more importantly, implementation/design choices). I don't know how
well this holds true for a spectrum of different usage patterns, but
my experience has been more from the large scale data processing side
of things. And don't get me wrong, I'm actually one of the few people
(out of my closest colleagues) who think that data processing can and
should be done in Java over C++, but that's another discussion
entirely :)

But while we're on the subject, I have been looking for some rough
benchmarks comparing the performance of Protocol Buffers in Java
versus C++. Do you (the collective you) have any [rough] idea as to
how they compare performance wise? I am thinking more in terms of
batch-style processing (disk I/O, parsing centric) rather than RPC
centric usage patterns. Any experiences you can share would be great.

Thanks!
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Java deserialization - any best practices for performances?

2009-07-23 Thread Kenton Varda
On Thu, Jul 23, 2009 at 12:32 AM, alopecoid  wrote:

>
> Hi,
>
> I haven't actually used the Java protobuf API, but it seems to me from
> the quick occasional glance that this isn't entirely true. I mean,
> specifically in response to the code snippet posted in the original
> message, I would possibly:
>
> 1. Reuse the Builder object by calling its clear() method. This would
> save from the need to create a new Builder object for each iteration
> of the outermost loop.


You can't continue to use a Builder after calling build().  Even if we made
it so you could, it would be building an entirely new object, not reusing
the old one.  We can't make it reuse the old one because that would break
the immutability guarantee of message objects.

Reusing the actual builder object is not that useful since it's only a very
small object containing a pointer to a message object.


> 2. Iterate over the repeated field using the get*Count() and get*
> (index) methods instead of the get*List() method. I'm not sure if this
> would save anything, but depending on how things are implemented in
> the generated code, this could save from allocating a new List object.


Won't save anything; we still need a list object internally.

But seriously, object allocation with a modern generational garbage
collector is extremely cheap, especially for objects that don't stick around
very long.  So I don't think there's much to gain here.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Compiling on AIX 5.3 using xlC 3.55 compiler

2009-07-23 Thread Kenton Varda
It looks like unordered_map or unordered_set is not behaving correctly (e.g.
not finding matching keys when they are present) but I can't really tell
just from that output.
On Mon, Jul 20, 2009 at 11:53 AM, vikram  wrote:

>
> In previous attempt small test worked but was not able to work with
> protocol buffer source code.
> I tried one more thing as
> #if deinfed (MISSING_HASH_MAP) && defined (__xlC__)
> #define hash_map std::tr1::unordered_map
>
> google/protobuf/unittest.proto:85:37: Expected ";".
> google/protobuf/unittest.proto:117:37: Expected ";".
> google/protobuf/unittest.proto:133:55: Expected identifier.
> google/protobuf/unittest.proto:134:55: Expected identifier.
> google/protobuf/unittest.proto:135:55: Expected identifier.
> google/protobuf/unittest.proto:136:55: Expected identifier.
> google/protobuf/unittest.proto:137:54: Expected identifier.
> google/protobuf/unittest.proto:138:55: Expected identifier.
> google/protobuf/unittest.proto:139:55: Expected identifier.
> google/protobuf/unittest.proto:140:55: Expected identifier.
> google/protobuf/unittest.proto:141:55: Expected identifier.
> google/protobuf/unittest.proto:142:54: Expected identifier.
> google/protobuf/unittest.proto:143:55: Expected identifier.
> google/protobuf/unittest.proto:144:55: Expected identifier.
> google/protobuf/unittest.proto:146:54: Expected identifier.
> google/protobuf/unittest.proto:147:54: Expected identifier.
> google/protobuf/unittest.proto:154:73: Expected identifier.
> google/protobuf/unittest.proto:155:57: Expected identifier.
> google/protobuf/unittest.proto:192:47: Expected ";".
> google/protobuf/unittest.proto:226:47: Expected ";".
> google/protobuf/unittest.proto:244:65: Expected identifier.
> google/protobuf/unittest.proto:245:65: Expected identifier.
> google/protobuf/unittest.proto:246:65: Expected identifier.
> google/protobuf/unittest.proto:247:65: Expected identifier.
> google/protobuf/unittest.proto:248:64: Expected identifier.
> google/protobuf/unittest.proto:249:65: Expected identifier.
> google/protobuf/unittest.proto:250:65: Expected identifier.
> google/protobuf/unittest.proto:251:65: Expected identifier.
> google/protobuf/unittest.proto:252:65: Expected identifier.
> google/protobuf/unittest.proto:253:64: Expected identifier.
> google/protobuf/unittest.proto:254:65: Expected identifier.
> google/protobuf/unittest.proto:255:65: Expected identifier.
> google/protobuf/unittest.proto:257:64: Expected identifier.
> google/protobuf/unittest.proto:258:64: Expected identifier.
> google/protobuf/unittest.proto:268:64: Expected identifier.
> google/protobuf/unittest.proto:269:68: Expected identifier.
> google/protobuf/unittest.proto:276:42: Expected identifier.
> google/protobuf/unittest.proto:379:26: Expected ";".
> google/protobuf/unittest.proto:380:26: Expected ";".
> google/protobuf/unittest.proto:451:47: Expected identifier.
> google/protobuf/unittest.proto:452:47: Expected identifier.
> google/protobuf/unittest.proto:453:47: Expected identifier.
> google/protobuf/unittest.proto:454:47: Expected identifier.
> google/protobuf/unittest.proto:455:47: Expected identifier.
> google/protobuf/unittest.proto:460:46: Expected identifier.
> gmake: *** [unittest_proto_middleman] Error 1
>
> So I guess it wont even compile with unordered_map provided on Linux .
> Please provide some inputs on this one.
>
> Thanks & Regards,
> Vikram
>
> On Jul 14, 6:35 pm, Kenton Varda  wrote:
> > It looks like your implementation of hash_map is not working correctly --
> > all lookups are failing.  You might try writing a little test for
> hash_map
> > itself that would be easier to debug.
> >
> > On Tue, Jul 14, 2009 at 6:27 PM, vikram  wrote:
> >
> > > Kenton & Monty,
> >
> > >I added hack as followes in the hash.h
> >
> > > // File changed .
> >
> > > #if defined(HAVE_HASH_MAP) && defined(HAVE_HASH_SET)
> > > #include HASH_MAP_H
> > > #include HASH_SET_H
> > > #elif  defined (__xlC__)
> > > #define MISSING_HASH
> > > #include 
> > > #include 
> > > #else
> > > #define MISSING_HASH
> > > #include 
> > > #include 
> > > #endif
> >
> > > namespace google {
> > > namespace protobuf {
> > > #if defined(MISSING_HASH) && defined(__xlC__)
> >
> > > //@TODO
> > > //Inherit hash_map from unordered_map
> > > template 
> > > struct hash : public std::tr1::hash {
> > > };
> >
> > > template 
> > > struct hash {
> > >  inline size_t operator()(const Key* key) const {
> > >return reinterpret_cast(key);
> > >  }
> > > };
> >
> > > template  > >  typename HashFcn = hash,
> > >  typename EqualKey = std::equal_to >
> > > class hash_map : public std::tr1::unordered_map > > EqualKey> {
> >
> > > };
> >
> > > template  > >  typename HashFcn = hash,
> > >  typename EqualKey = std::equal_to >
> > > class hash_set : public std::tr1::unordered_set<
> > >Key, HashFcn, EqualKey> {
> > > };
> > > #elif defined(MISSING_HASH)
> >
> > > File continues as it is
> >
> > > Stack trace
> >
> > > pthrea

Re: Detecting message type prior parsing a ByteBuffer containing the message

2009-07-23 Thread jasonh



On Jul 23, 12:54 am, Benedikt Hallinger
 wrote:
> Hello (and sorry for bothering) and thank you for your clarifying.
>
> > The "type" is simply transmitted as a tag on the wire. The wire format
> > of extensions is just like the wire format of a regular field - the
> > difference is that the containing type (Packet) need not know about
> > the extension fields in the .proto definition. Instead, each payload
> > type that you might include will have a unique extension number. When
> > you include one of the payload messages in your program, it will
> > register its extensions of Packet. Then when a packet message is
> > parsed, if it sees a tag that is not defined in the .proto but is a
> > valid extension, it looks up the extension information and parses it
> > appropriately.
>
> The idea is good and clear, but i have problems figuring out the code.
> The huge problem is, that the PacketManager should be designed protocol
> agnostic, so it
> can deal with all protocols designed in the extension way you described.
>
> This said, if i understood it right, i would write it as following:
> ---File: myProto.proto---
> message Packet {
> extensions n to m;
>
> }
>
> // Payload messages
> // the type field must be the same for all mesages, with differing tag
> number
> message TestMessage {
> extend Packet {
> optional TestMessage type = 10;}
>
> required string msg = 2;
>
> }
>
> -
>
> ---File: PacketHandler.java---
> class PacketHandler {
> ...
>
> // Gets called whenever a new message arrives
> // ByteBuffer is passed to this method containing
> // the raw data from the net.
> public void messageRecieved(ByteBuffer buffer) {
> byte[] bytes = buffer.array(); // fetch bytes from buffer
>
> // parse the header data;
> // this.header is passed in the constructor of PacketHandler;
> // it is an instance of a "Packet" message.
> ExtendableMessage header =
> this.header.newBuilderForType().mergeFrom(bytes).build();

You need to provide the ExtensionRegistry to the builder when invoking
mergeFrom() so that it knows how to parse the extended types.

>
> // lets see what tag number the extension has, this is the type identifier.
> // this is done via the extension registry.
> ExtensionRegistry extReg = ExtensionRegistry.newInstance();
> int type = extReg.findExtensionByName("type").descriptor().getNumber();

This doesn't access anything in the message that was transmitted - it
just sees if there was an extension registered with the name type.
Instead, you can use the reflection methods to determine which fields
are set in the parsed message - maybe something like:

Map field_map = header.getAllFields
();
for (Map.Entry field :
field_map.entrySet()) {
  HandleType(field.getKey().number(),// this is the extension
number of the message that was transmitted over the wire
   field.getValue());
}

>
> // invoke the right handler for that message, it will take care of the
> rest:
> getHandler(type).handle(header.getExtension(type));
>
> }
>
> ...}
>
> -
>
> However, as i tried that, it gave me errors i cant solve.
> As long as i work with concrete classes and names, all is okay, but that is
> not what i want here. What i want is to avoid that hardcoded "switch"
> statement on types, so it is easier to add new message types without having
> to fiddle around with much code.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Writing a router tha can route messages

2009-07-23 Thread Kaj Bjurman

Thanks.

On 16 Juli, 00:43, Kenton Varda  wrote:
> On Wed, Jul 15, 2009 at 10:56 AM, jasonh  wrote:
> > Foo f = Foo.newBuilder().mergeFrom(...).setCookie(newCookieVal).build
> > ();
>
> There's actually a shortcut for this:  the toBuilder() method of the Message
> interface returns a Builder that is pre-initialized as a copy of that
> message.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Writing a router tha can route messages

2009-07-23 Thread Kaj Bjurman

Thanks Jason,

Yes all messages would use the same tag number for the "cookie" field,
and all messages must have that field, but that is all that the router
should know, and it must be able to append data to that field.

I think I have understood what to do now, and thanks for the tip about
where to look when it comes to modifying messages.

Kaj



On 15 Juli, 19:56, jasonh  wrote:
> On Jul 15, 5:33 am, Kaj Bjurman  wrote:
>
>
>
>
>
> > Hi,
>
> > I have just read the tutorials and the some threads in these forums,
> > but there's one thing that I can't find an answer for. I'm currently
> > using a router that is responsible to route different messages to
> > different servers. The protocol that we are currently using is a very
> > simple. Basically header:tag=value:tag=value: and so on.
>
> > The router looks for a field with certain name (e.g. cookie) and
> > routes a message based on the information in the value of that tag,
> > and also modifies that field.
>
> > Is it possible to write a similar router for PB? The router that we
> > have shouldn't know about all message types. It just need to know that
> > a field with a certain tag always should exist and that the field has
> > a value. I don't parse the data of the wire into messages, do
> > modifications and then write them back to the wire.
>
> > So, in short. How to do that using PB?
>
> I'm a little unclear about your requirement for arbitrary types. Do
> your message types all define some common header values? The wire
> format doesn't have a notion of tag names, simply tag numbers and
> types. Therefore, unless all your messages define "cookie" to have the
> same tag number, the router won't know what field to look for.
> However, I think you can accomplish what you want by defining a header
> message containing the fields your router needs to know about, and
> include the payload as raw data that is interpreted by clients who
> know what specific message type to read.
>
> You can make sure that the field exists by labeling it "required" in
> the message definition - then AbstractMessage.isInitialized() will
> fail if the field has not been set.
>
>
>
> > I'm also a bit curious on how to decode arbritrary messages, modify
> > them and write tham back (even if I won't do that). How do you
> > normally modify messages since they are immutable? Is there a way to
> > get a builder for a certain message, and have the builder initialized
> > so that it represents a copy of the original message?
>
> Yes, the Builder class is what you want here. You can initialize a
> builder using the original message, or better yet directly from the
> serialized encoding from the wire (assuming you don't care about the
> original message).
>
> Foo f = Foo.newBuilder().mergeFrom(...).setCookie(newCookieVal).build
> ();
>
> Hope that helps,
> Jason
>
>
>
>
>
> > I'm using Java if that makes a difference
>
> > Thanks
> > Kaj
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Detecting message type prior parsing a ByteBuffer containing the message

2009-07-23 Thread Benedikt Hallinger


Hello (and sorry for bothering) and thank you for your clarifying.

> The "type" is simply transmitted as a tag on the wire. The wire format
> of extensions is just like the wire format of a regular field - the
> difference is that the containing type (Packet) need not know about
> the extension fields in the .proto definition. Instead, each payload
> type that you might include will have a unique extension number. When
> you include one of the payload messages in your program, it will
> register its extensions of Packet. Then when a packet message is
> parsed, if it sees a tag that is not defined in the .proto but is a
> valid extension, it looks up the extension information and parses it
> appropriately.

The idea is good and clear, but i have problems figuring out the code.
The huge problem is, that the PacketManager should be designed protocol
agnostic, so it
can deal with all protocols designed in the extension way you described.

This said, if i understood it right, i would write it as following:
---File: myProto.proto---
message Packet {
extensions n to m;
}

// Payload messages
// the type field must be the same for all mesages, with differing tag
number
message TestMessage {
extend Packet {
optional TestMessage type = 10;
}
required string msg = 2;

}
-

---File: PacketHandler.java---
class PacketHandler {
...

// Gets called whenever a new message arrives
// ByteBuffer is passed to this method containing
// the raw data from the net.
public void messageRecieved(ByteBuffer buffer) {
byte[] bytes = buffer.array(); // fetch bytes from buffer

// parse the header data;
// this.header is passed in the constructor of PacketHandler;
// it is an instance of a "Packet" message.
ExtendableMessage header =
this.header.newBuilderForType().mergeFrom(bytes).build();

// lets see what tag number the extension has, this is the type identifier.
// this is done via the extension registry.
ExtensionRegistry extReg = ExtensionRegistry.newInstance();
int type = extReg.findExtensionByName("type").descriptor().getNumber();

// invoke the right handler for that message, it will take care of the
rest:
getHandler(type).handle(header.getExtension(type));
}

...
}
-


However, as i tried that, it gave me errors i cant solve.
As long as i work with concrete classes and names, all is okay, but that is
not what i want here. What i want is to avoid that hardcoded "switch"
statement on types, so it is easier to add new message types without having
to fiddle around with much code.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Java deserialization - any best practices for performances?

2009-07-23 Thread alopecoid

Hi,

I haven't actually used the Java protobuf API, but it seems to me from
the quick occasional glance that this isn't entirely true. I mean,
specifically in response to the code snippet posted in the original
message, I would possibly:

1. Reuse the Builder object by calling its clear() method. This would
save from the need to create a new Builder object for each iteration
of the outermost loop.

2. Iterate over the repeated field using the get*Count() and get*
(index) methods instead of the get*List() method. I'm not sure if this
would save anything, but depending on how things are implemented in
the generated code, this could save from allocating a new List object.

Also, might "bytes" type fields perform better than any "string" type
fields that you may have in your particular data set? I'm not sure,
but it might be worth benchmarking.

On Jul 18, 9:22 pm, Kenton Varda  wrote:
> On Fri, Jul 17, 2009 at 8:13 PM, Alex Black  wrote:
>
> > When I write out messages using C++ I'm careful to clear messages and
> > re-use them, is there something equivalent on the java side when
> > reading those same messages in?
>
> No.  Sorry.  This just doesn't fit at all with the Java library's design,
> and even if it did, you cannot reuse Java String objects, which often
> account for most of the memory usage.  However, memory allocation is cheaper
> in Java than in C++, so there's less to gain from it.
>
>
>
> > My code looks like:
>
> > CodedInputStream stream = CodedInputStream.newInstance(inputStream);
>
> > while ( !stream.isAtEnd() )
> > {
> >     MyMessage.Builder builder = MyMessage.newBuilder();
> >     stream.readMessage(builder, null);
> >     MyMessage myMessage = builder.build();
>
> >     for ( MessageValue messageValue : myMessage.getValuesList() )
> >     {
> >        ..
> >     }
> > }
>
> > I'm passing 150 messages each with 1000 items, so presumably memory is
> > allocated 150 times for each of the messages...
>
> > - Alex
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---