I think the usual way we would have solved this problem at Google would be
to have the message "payload" be encoded separately and embedded in the
"envelope" as a "bytes" field, e.g.:
  message Envelope {
    required string to_address = 1;
    optional string from_address = 2;
    required bytes payload = 3;  // an encoded message
  }

It's not as transparent as your solution, but it is a whole lot simpler, and
the behavior is easy to understand.

That said, again, there's nothing preventing lazy parsing from being added
to Google's Java protobuf implementation, so I'm not sure why writing
something completely new was necessary.

As far as the performance arguments go, I'd again encourage you to create a
benchmark that actually measures the performance of the case where the
application code ends up accessing all the fields.  If you really think
there's no significant overhead, prove it.  :)

I'd also suggest that you not publish benchmarks implying that your
implementation is an order of magnitude faster at parsing without explaining
what is really going on.  It's rather misleading.

On Fri, Sep 18, 2009 at 5:53 PM, hi...@hiramchirino.com
<chir...@gmail.com>wrote:

>
> Hi Kenton,
>
> Let me start off by describing my usage scenario.
>
> I'm interested in using protobuf to implement the messaging protocol
> between clients and servers of a distributed messaging system.  For
> simplicity, lets pretend the that protocol is similar to xmpp and that
> there are severs which handle delivering messages to and from clients.
>
> In this case, the server clearly is not interested in the meat of the
> messages being sent around.  It is typically only interested routing
> data.  In this case, deferred decoding provides a substantial win.
> Furthermore, when the server passes on the message to the consumer, he
> does not need to encode the message again.  For important messages,
> the server may be configured to persist those messages as they come
> in, so the server would once again benefit from not having to encode
> the message yet again.
>
> I don't think the user could implement those optimizations on their
> own without support from the protobuf implementation.  At least not as
> efficiently and elegantly.  You have to realize that the 'free
> encoding' holds true for even nested message structures in the
> message.  So lets say that the user aggregating data from multiple
> source protobuf messages and is picking data out of it and placing it
> into a new protobuf message that then gets encoded.  Only the outer
> message would need encoding, the inner nested element which were
> picked from the other buffers would benefit from the 'free encoding'.
>
> The overhead of the lazy decoding is exactly 1 extra "if (bean ==
> null)" statement, which is probably cheaper than most virtual dispatch
> invocations.  But if you're really trying to milk the performance out
> of your app, you should just call buffer.copy() to get the bean
> backing the buffer.  All get operations on the bean do NOT have the
> overhead.
>
> Regarding threading, since the buffer is immutable and decoding is
> idempotent, you don't really need to worry about thread safety.  Worst
> case scenario is that 2 threads decode the same buffer concurrently
> and then set the bean field of the buffer.  Since the resulting beans
> are equal, in most cases it would not really matter which thread wins
> when they overwrite the bean field.
>
> As for up front validation, in my use case, deferring validation is a
> feature.  The less work the server has to do the better since, it will
> help scale vertically.  I do agree that in some use cases it would be
> desirable to fully validate up front.  I think it should be up to the
> application to decide if it wants up front validation or deferred
> decoding.  For example, it would be likely that the client of the
> messaging protocol would opt for up front validation.   On the other
> hand, the server would use deferred decoding.  It's definitely a
> performance versus consistency trade-off.
>
> I think that once you make 'free encoding', and deferred decoding an
> option, users that have high performance use cases will design their
> application so that they can exploit those features as much as
> possible.
>
> --
> Regards,
> Hiram
>
> Blog: http://hiramchirino.com
>
> Open Source SOA
> http://fusesource.com/
>
> On Sep 18, 6:43 pm, Kenton Varda <ken...@google.com> wrote:
> > Hmm, your bean and buffer classes sound conceptually equivalent to my
> > builder and message classes.
> > Regarding lazy parsing, this is certainly something we've considered
> before,
> > but it introduces a lot of problems:
> >
> > 1) Every getter method must now first check whether the message is
> parsed,
> > and parse it if not.  Worse, for proper thread safety it really needs to
> > lock a mutex while performing this check.  For a fair comparison of
> parsing
> > speed, you really need another benchmark which measures the speed of
> > accessing all the fields of the message.  I think you'll find that
> parsing a
> > message *and* accessing all its fields is significantly slower with the
> lazy
> > approach.  Your approach might be faster in the case of a very deep
> message
> > in which the user only wants to access a few shallow fields, but I think
> > this case is relatively uncommon.
> >
> > 2) What happens if the message is invalid?  The user will probably expect
> > that calling simple getter methods will not throw parse exceptions, and
> > probably isn't in a good position to handle these exceptions.  You really
> > want to detect parse errors at parse time, not later on down the road.
> >
> > We might add lazy parsing to the official implementation at some point.
> >  However, the approach we'd probably take is to use it only on fields
> which
> > are explicitly marked with a "[lazy=true]" option.  Developers would use
> > this to indicate fields for which the performance trade-offs favor lazy
> > parsing, and they are willing to deal with delayed error-checking.
> >
> > In your blog post you also mention that encoding the same message object
> > multiple times without modifying it in between, or parsing a message and
> > then serializing it without modification, is "free"...  but how often
> does
> > this happen in practice?  These seem like unlikely cases, and easy for
> the
> > user to optimize on their own without support from the protobuf
> > implementation.
> >
> > On Fri, Sep 18, 2009 at 3:15 PM, hi...@hiramchirino.com
> > <chir...@gmail.com>wrote:
> >
> >
> >
> > > Hi Kenton,
> >
> > > Your right, the reason that one benchmark has those results is because
> > > the implementation does lazy decoding.  While lazy decoding is nice, I
> > > think that implementation has a couple of other features which are
> > > equally as nice.  See more details about it them here:
> >
> > >http://hiramchirino.com/blog/2009/09/activemq-protobuf-implemtation-f.
> ..
> >
> > > It would have hard to impossible to implement some of the stuff
> > > without the completely different class structure it uses.  I'd be
> > > happy if it's features could be absorbed into the official
> > > implementation.  I'm just not sure how you could do that and maintain
> > > compatibility with your existing users.
> >
> > > If you have any suggestions of how we can integrate better please
> > > advise.
> >
> > > Regards,
> > > Hiram
> >
> > > On Sep 18, 12:34 pm, Kenton Varda <ken...@google.com> wrote:
> > > > So, his implementation is a little bit faster in two of the
> benchmarks,
> > > and
> > > > impossibly faster in the other one.  I don't really believe that it's
> > > > possible to improve parsing time by as much as he claims, except by
> doing
> > > > something like lazy parsing, which would just be deferring the work
> to
> > > later
> > > > on.  Would have been nice if he'd contributed his optimizations back
> to
> > > the
> > > > official implementation rather than write a whole new one...
> >
> > > > On Fri, Sep 18, 2009 at 1:38 AM, ijuma <ism...@juma.me.uk> wrote:
> >
> > > > > Hey all,
> >
> > > > > I ran across the following and thought it may be of interest to
> this
> > > > > list:
> >
> > > > >
> http://hiramchirino.com/blog/2009/09/activemq-protobuf-implementation.
> > > ..
> >
> > > > > Best,
> > > > > Ismael
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to