Thanks Evan. That was very helpful. I got rid of the external object and created the internal objects directly. After that the only part that was taking time was decoding. I like the idea of using bytes for serialization and do my own encoding/decoding on top of that. That way I can delay decoding until it is needed. For example for comparisons I should just be able to use the bytes. Also do you think that if I encode/decode using utf-16 it would be faster? Clearly it is not as compressed.
On Aug 22, 11:58 am, Evan Jones <ev...@mit.edu> wrote: > On Aug 19, 2010, at 11:45 , achintms wrote: > > > I have an application that is reading data from disk and is using > > proto buffers to create java objects. When doing performance analysis > > I was surprised to find out that most of the time was spent in and > > around proto buffers and not reading data from disk. > > In my experience, protocol buffers are more than fast enough to be > able to keep up with disk speeds. That is, when reading uncached data > from the disk at 100 MB/s, protocol buffers can decode it at that > speed. Now, if your data is cached, and your application is not doing > much with the data, then I would expect protocol buffers to take 100% > of the CPU time, since the disk read doesn't take CPU, and your > application isn't doing much. > > In other words: in a more "real" application, I would expect protocol > buffers will take only a very small portion of your application's time. > > > Again I expected that decoding strings would be almost all the time > > (although decoding here still seems slower than in C in my > > experience). I am trying to figure out why mergeFrom method for this > > message is taking 6 sec (own time). > > Decoding strings in Java is way slower because it actually decodes the > UTF-8 encoded strings into UTF-16 strings in memory. The C++ version > just leaves the data in UTF-8. If this is a performance issue for your > application, you may wish to consider using the bytes protocol buffer > type rather than strings. This is less convenient, and means you can > "screw up" by accidentally sending invalid data, but is faster. > > > There are around 15 SubMessages. > > This is basically the problem right here. Each time you parse one of > these messages, it ends up allocating a new object for each of these > sub messages, and a new object for each string inside them. This is > pretty slow. > > As I said above: I suspect that in a "real" application, this won't be > a problem. However, it would be faster if you get rid of all the sub > messages (assuming that you don't actually need them for some other > reason). > > Finally, I'll take a moment to promote my patch that improves Java > message *encoding* performance, by optimizing string encoding. It is > available at the following URL. Unfortunately, there is no similar > approach to improving the decoding performance. > > http://codereview.appspot.com/949044/ > > Evan > > -- > Evan Joneshttp://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.