Thanks Evan. That was very helpful. I got rid of the external object
and created the internal objects directly. After that the only part
that was taking time was decoding. I like the idea of using bytes for
serialization and do my own encoding/decoding on top of that. That way
I can delay decoding until it is needed. For example for comparisons I
should just be able to use the bytes. Also do you think that if I
encode/decode using utf-16 it would be faster? Clearly it is not as
compressed.

On Aug 22, 11:58 am, Evan Jones <ev...@mit.edu> wrote:
> On Aug 19, 2010, at 11:45 , achintms wrote:
>
> > I have an application that is reading data from disk and is using
> > proto buffers to create java objects. When doing performance analysis
> > I was surprised to find out that most of the time was spent in and
> > around proto buffers and not reading data from disk.
>
> In my experience, protocol buffers are more than fast enough to be  
> able to keep up with disk speeds. That is, when reading uncached data  
> from the disk at 100 MB/s, protocol buffers can decode it at that  
> speed. Now, if your data is cached, and your application is not doing  
> much with the data, then I would expect protocol buffers to take 100%  
> of the CPU time, since the disk read doesn't take CPU, and your  
> application isn't doing much.
>
> In other words: in a more "real" application, I would expect protocol  
> buffers will take only a very small portion of your application's time.
>
> > Again I expected that decoding strings would be almost all the time
> > (although decoding here still seems slower than in C in my
> > experience). I am trying to figure out why mergeFrom method for this
> > message is taking 6 sec (own time).
>
> Decoding strings in Java is way slower because it actually decodes the  
> UTF-8 encoded strings into UTF-16 strings in memory. The C++ version  
> just leaves the data in UTF-8. If this is a performance issue for your  
> application, you may wish to consider using the bytes protocol buffer  
> type rather than strings. This is less convenient, and means you can  
> "screw up" by accidentally sending invalid data, but is faster.
>
> > There are around 15 SubMessages.
>
> This is basically the problem right here. Each time you parse one of  
> these messages, it ends up allocating a new object for each of these  
> sub messages, and a new object for each string inside them. This is  
> pretty slow.
>
> As I said above: I suspect that in a "real" application, this won't be  
> a problem. However, it would be faster if you get rid of all the sub  
> messages (assuming that you don't actually need them for some other  
> reason).
>
> Finally, I'll take a moment to promote my patch that improves Java  
> message *encoding* performance, by optimizing string encoding. It is  
> available at the following URL. Unfortunately, there is no similar  
> approach to improving the decoding performance.
>
> http://codereview.appspot.com/949044/
>
> Evan
>
> --
> Evan Joneshttp://evanjones.ca/

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Reply via email to