Somehow I missed that message. Sorry about that. I'd definitely like to have lazy parsing (as an option) in the official implementation. The reason I'm "stressing" is because there's a lot of these things that I'd like protocol buffers to have, but I don't have enough time to write them all myself, so I need help from contributors. Unfortunately it seems that a lot of people would rather write their own implementations from scratch than try to contribute to the main one -- you aren't the first person who has done this. That said, having competition is a good thing too.
Regarding maven plugins -- why can't the plugin just invoke protoc using Runtime.exec()? What's the benefit of having the code generator running inside the Maven process? Honest question -- I don't know very much about Maven. On Fri, Sep 18, 2009 at 7:36 PM, [email protected] <[email protected]>wrote: > > Firstly, I want to clarify that I did not write the benchmark that I > plugged into. There is no ill intent. I published the benchmark so > that folks take the time to look into why my implementation performed > so much better. I think it's good to have healthy discussions about > the pros and cons of alternative implementations which deliver > different sets of features. > > The main reason I started from scratch is that I wanted to implement a > java based code generator so that it would be easy to embed in a maven > plugin or ant task. Furthermore, It was just more expedient to start > from a clean slate and design my ideal object model. > I did ping this list over a year ago to gauge if there would be any > interest in collaborating, but did not garner interest. So, I did not > pursue it further: > > > http://groups.google.com/group/protobuf/browse_thread/thread/fe7ea8706b40146f/bdd22ddf89e4a6d3?#bdd22ddf89e4a6d3 > > Perhaps I'm misreading you, but it seems like there have been very few > ideas that you are actually interested in from my implementation. So > I'm not sure why you're stressing about me rolling this out as new > implementation. > > Bottom line, is I would LOVE IT if the google implementation achieves > feature parity with mine. That way it's one less code base I need to > maintain! Best of luck and if you do change your mind and want to > poach any of the concepts or code, please feel free to do so. > > Regards, > Hiram > > On Sep 18, 9:40 pm, Kenton Varda <[email protected]> wrote: > > I think the usual way we would have solved this problem at Google would > be > > to have the message "payload" be encoded separately and embedded in the > > "envelope" as a "bytes" field, e.g.: > > message Envelope { > > required string to_address = 1; > > optional string from_address = 2; > > required bytes payload = 3; // an encoded message > > } > > > > It's not as transparent as your solution, but it is a whole lot simpler, > and > > the behavior is easy to understand. > > > > That said, again, there's nothing preventing lazy parsing from being > added > > to Google's Java protobuf implementation, so I'm not sure why writing > > something completely new was necessary. > > > > As far as the performance arguments go, I'd again encourage you to create > a > > benchmark that actually measures the performance of the case where the > > application code ends up accessing all the fields. If you really think > > there's no significant overhead, prove it. :) > > > > I'd also suggest that you not publish benchmarks implying that your > > implementation is an order of magnitude faster at parsing without > explaining > > what is really going on. It's rather misleading. > > > > On Fri, Sep 18, 2009 at 5:53 PM, [email protected] > > <[email protected]>wrote: > > > > > > > > > Hi Kenton, > > > > > Let me start off by describing my usage scenario. > > > > > I'm interested in using protobuf to implement the messaging protocol > > > between clients and servers of a distributed messaging system. For > > > simplicity, lets pretend the that protocol is similar to xmpp and that > > > there are severs which handle delivering messages to and from clients. > > > > > In this case, the server clearly is not interested in the meat of the > > > messages being sent around. It is typically only interested routing > > > data. In this case, deferred decoding provides a substantial win. > > > Furthermore, when the server passes on the message to the consumer, he > > > does not need to encode the message again. For important messages, > > > the server may be configured to persist those messages as they come > > > in, so the server would once again benefit from not having to encode > > > the message yet again. > > > > > I don't think the user could implement those optimizations on their > > > own without support from the protobuf implementation. At least not as > > > efficiently and elegantly. You have to realize that the 'free > > > encoding' holds true for even nested message structures in the > > > message. So lets say that the user aggregating data from multiple > > > source protobuf messages and is picking data out of it and placing it > > > into a new protobuf message that then gets encoded. Only the outer > > > message would need encoding, the inner nested element which were > > > picked from the other buffers would benefit from the 'free encoding'. > > > > > The overhead of the lazy decoding is exactly 1 extra "if (bean == > > > null)" statement, which is probably cheaper than most virtual dispatch > > > invocations. But if you're really trying to milk the performance out > > > of your app, you should just call buffer.copy() to get the bean > > > backing the buffer. All get operations on the bean do NOT have the > > > overhead. > > > > > Regarding threading, since the buffer is immutable and decoding is > > > idempotent, you don't really need to worry about thread safety. Worst > > > case scenario is that 2 threads decode the same buffer concurrently > > > and then set the bean field of the buffer. Since the resulting beans > > > are equal, in most cases it would not really matter which thread wins > > > when they overwrite the bean field. > > > > > As for up front validation, in my use case, deferring validation is a > > > feature. The less work the server has to do the better since, it will > > > help scale vertically. I do agree that in some use cases it would be > > > desirable to fully validate up front. I think it should be up to the > > > application to decide if it wants up front validation or deferred > > > decoding. For example, it would be likely that the client of the > > > messaging protocol would opt for up front validation. On the other > > > hand, the server would use deferred decoding. It's definitely a > > > performance versus consistency trade-off. > > > > > I think that once you make 'free encoding', and deferred decoding an > > > option, users that have high performance use cases will design their > > > application so that they can exploit those features as much as > > > possible. > > > > > -- > > > Regards, > > > Hiram > > > > > Blog:http://hiramchirino.com > > > > > Open Source SOA > > >http://fusesource.com/ > > > > > On Sep 18, 6:43 pm, Kenton Varda <[email protected]> wrote: > > > > Hmm, your bean and buffer classes sound conceptually equivalent to my > > > > builder and message classes. > > > > Regarding lazy parsing, this is certainly something we've considered > > > before, > > > > but it introduces a lot of problems: > > > > > > 1) Every getter method must now first check whether the message is > > > parsed, > > > > and parse it if not. Worse, for proper thread safety it really needs > to > > > > lock a mutex while performing this check. For a fair comparison of > > > parsing > > > > speed, you really need another benchmark which measures the speed of > > > > accessing all the fields of the message. I think you'll find that > > > parsing a > > > > message *and* accessing all its fields is significantly slower with > the > > > lazy > > > > approach. Your approach might be faster in the case of a very deep > > > message > > > > in which the user only wants to access a few shallow fields, but I > think > > > > this case is relatively uncommon. > > > > > > 2) What happens if the message is invalid? The user will probably > expect > > > > that calling simple getter methods will not throw parse exceptions, > and > > > > probably isn't in a good position to handle these exceptions. You > really > > > > want to detect parse errors at parse time, not later on down the > road. > > > > > > We might add lazy parsing to the official implementation at some > point. > > > > However, the approach we'd probably take is to use it only on fields > > > which > > > > are explicitly marked with a "[lazy=true]" option. Developers would > use > > > > this to indicate fields for which the performance trade-offs favor > lazy > > > > parsing, and they are willing to deal with delayed error-checking. > > > > > > In your blog post you also mention that encoding the same message > object > > > > multiple times without modifying it in between, or parsing a message > and > > > > then serializing it without modification, is "free"... but how often > > > does > > > > this happen in practice? These seem like unlikely cases, and easy > for > > > the > > > > user to optimize on their own without support from the protobuf > > > > implementation. > > > > > > On Fri, Sep 18, 2009 at 3:15 PM, [email protected] > > > > <[email protected]>wrote: > > > > > > > Hi Kenton, > > > > > > > Your right, the reason that one benchmark has those results is > because > > > > > the implementation does lazy decoding. While lazy decoding is > nice, I > > > > > think that implementation has a couple of other features which are > > > > > equally as nice. See more details about it them here: > > > > > > > > http://hiramchirino.com/blog/2009/09/activemq-protobuf-implemtation-f. > > > .. > > > > > > > It would have hard to impossible to implement some of the stuff > > > > > without the completely different class structure it uses. I'd be > > > > > happy if it's features could be absorbed into the official > > > > > implementation. I'm just not sure how you could do that and > maintain > > > > > compatibility with your existing users. > > > > > > > If you have any suggestions of how we can integrate better please > > > > > advise. > > > > > > > Regards, > > > > > Hiram > > > > > > > On Sep 18, 12:34 pm, Kenton Varda <[email protected]> wrote: > > > > > > So, his implementation is a little bit faster in two of the > > > benchmarks, > > > > > and > > > > > > impossibly faster in the other one. I don't really believe that > it's > > > > > > possible to improve parsing time by as much as he claims, except > by > > > doing > > > > > > something like lazy parsing, which would just be deferring the > work > > > to > > > > > later > > > > > > on. Would have been nice if he'd contributed his optimizations > back > > > to > > > > > the > > > > > > official implementation rather than write a whole new one... > > > > > > > > On Fri, Sep 18, 2009 at 1:38 AM, ijuma <[email protected]> > wrote: > > > > > > > > > Hey all, > > > > > > > > > I ran across the following and thought it may be of interest to > > > this > > > > > > > list: > > > > >http://hiramchirino.com/blog/2009/09/activemq-protobuf-implementation. > > > > > .. > > > > > > > > > Best, > > > > > > > Ismael > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~----------~----~----~----~------~----~------~--~---
