[protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

fpmc Wed, 02 Feb 2011 10:00:21 -0800

Your proposal has one VisitXXX function for each repeated type.  How
does it handle a message with two repeated fields of the same type?


On Feb 2, 2:30 pm, Kenton Varda <ken...@google.com> wrote:
> On Tue, Feb 1, 2011 at 3:17 PM, Jason Hsueh <jas...@google.com> wrote:
> > Conceptually this sounds great, the big question to me is whether this
> > should be implemented as an option in the compiler or as a separate plugin.
> > I haven't taken a thorough look at the patch, but I'd guess it adds a decent
> > amount to the core code generator. I have a preference for the plugin
> > approach, but of course I'm primarily an internal protobuf user, so I'm
> > willing to be convinced otherwise :-) Would using a plugin, possibly even
> > shipped with the standard implementation, make this feature too inconvenient
> > to use? Or is there enough demand for this that it warrants implementing as
> > an option?
>
> First of all, note that this feature is off by default.  You have to turn it
> on with the generate_visitors message-level option.  The only new code added
> to the base library is a couple templates in WireFormatLite, which are of
> course never instantiated if you don't generate visitor code.
>
> There are a few reasons I prefer to make this part of the base code
> generator:
>
> - If you look at the patch, you'll see that the code generation for the two
> Guide classes actually shares a lot with the code generation for
> MergeFromCodedStream and SerializeWithCachedSizes.  To make this a plugin,
> either we'd have to expose parts of the C++ code generator internals
> publicly (eww) or we'd have to reproduce a lot of code (also eww).
>
> - The Reader and Writer classes directly use WireFormatLite, which is a
> private interface.
>
> - It seems clear that this feature is widely desired by open source users.
>  We're not talking about a niche use case here.
>
> > Regarding the proposed interfaces: I can imagine some applications where
> > the const refs passed to the visitor methods may be too restrictive - the
> > user may instead want to take ownership of the object. e.g., suppose the
> > stream is a series of requests, and each of the visitor handlers needs to
> > start some asynchronous work. It would be good to hear if users have use
> > cases that don't quite fit into this model (or at least if the existing use
> > cases will work).
>
> Interesting point.  In the Reader case, it's creating new objects, so in
> theory it ought to be able to hand off ownership to the Visitor it calls.
>  But, the Walker is walking an existing object and thus clearly cannot give
> up ownership.  It seems clear that some use cases need const references,
> which means that the only way we could support ownership passing is by
> adding another parallel set of methods.  I suppose they could have default
> implementations that delegate to the const reference versions, in which case
> only people who wanted to optimize for them would need to override them.
>  But I'd like to see that this is really desired first -- it's easy enough
> to add later.
>
> Also note that my code currently doesn't reuse message objects, but
> improving it to do so would be straightforward.  A Reader could allocate one
> object of each sub-message type for reuse.  But, it seems like that wouldn't
> play well with ownership-passing.
>
>
>
>
>
>
>
>
>
> > On Tue, Feb 1, 2011 at 10:45 AM, Kenton Varda <ken...@google.com> wrote:
>
> >> Hello open source protobuf users,
>
> >> *Background*
>
> >> Probably the biggest deficiency in the open source protocol buffers
> >> libraries today is a lack of built-in support for handling streams of
> >> messages.  True, it's not too hard for users to support it manually, by
> >> prefixing each message with its size as described here:
>
> >>http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre...
>
> >> However, this is awkward, and typically requires users to reach into the
> >> low-level CodedInputStream/CodedOutputStream classes and do a lot of work
> >> manually.  Furthermore, many users want to handle streams
> >> of heterogeneous message types.  We tell them to wrap their messages in an
> >> outer type using the "union" pattern:
>
> >>  http://code.google.com/apis/protocolbuffers/docs/techniques.html#union
>
> >> But this is kind of ugly and has unnecessary overhead.
>
> >> These problems never really came up in our internal usage, because inside
> >> Google we have an RPC system and other utility code which builds on top of
> >> protocol buffers and provides appropriate abstraction. While we'd like to
> >> open source this code, a lot of it is large, somewhat messy, and highly
> >> interdependent with unrelated parts of our environment, and no one has had
> >> the time to rewrite it all cleanly (as we did with protocol buffers 
> >> itself).
>
> >> *Proposed solution:  Generated Visitors*
>
> >> I've been wanting to fix this for some time now, but didn't really have a
> >> good idea how.  CodedInputStream is annoyingly low-level, but I couldn't
> >> think of much better an interface for reading a stream of messages off the
> >> wire.
>
> >> A couple weeks ago, though, I realized that I had been failing to consider
> >> how new kinds of code generation could help this problem.  I was trying to
> >> think of solutions that would go into the protobuf base library, not
> >> solutions that were generated by the protocol compiler.
>
> >> So then it became pretty clear:  A protobuf message definition can also be
> >> interpreted as a definition for a streaming protocol.  Each field in the
> >> message is a kind of item in the stream.
>
> >>   // A stream of Foo and Bar messages, and also strings.
> >>   message MyStream {
> >>     option generate_visitors = true;  // enables generation of streaming
> >> classes
> >>     repeated Foo foo = 1;
> >>     repeated Bar bar = 2;
> >>     repeated string baz = 3;
> >>   }
>
> >> All we need to do is generate code appropriate for treating MyStream as a
> >> stream, rather than one big message.
>
> >> My approach is to generate two interfaces, each with two provided
> >> implementations.  The interfaces are "Visitor" and "Guide".
> >>  MyStream::Visitor looks like this:
>
> >>   class MyStream::Visitor {
> >>    public:
> >>     virtual ~Visitor();
>
> >>     virtual void VisitFoo(const Foo& foo);
> >>     virtual void VisitBar(const Bar& bar);
> >>     virtual void VisitBaz(const std::string& baz);
> >>   };
>
> >> The Visitor class has two standard implementations:  "Writer" and
> >> "Filler".  MyStream::Writer writes the visited fields to a
> >> CodedOutputStream, using the same wire format as would be used to encode
> >> MyStream as one big message.  MyStream::Filler fills in a MyStream message
> >> object with the visited values.
>
> >> Meanwhile, Guides are objects that drive Visitors.
>
> >>   class MyStream::Guide {
> >>    public:
> >>     virtual ~Guide();
>
> >>     // Call the methods of the visitor on the Guide's data.
> >>     virtual void Accept(MyStream::Visitor* visitor) = 0;
>
> >>     // Just fill in a message object directly rather than use a visitor.
> >>     virtual void Fill(MyStream* message) = 0;
> >>   };
>
> >> The two standard implementations of Guide are "Reader" and "Walker".
> >>  MyStream::Reader reads items from a CodedInputStream and passes them to 
> >> the
> >> visitor.  MyStream::Walker walks over a MyStream message object and passes
> >> all the fields to the visitor.
>
> >> To handle a stream of messages, simply attach a Reader to your own Visitor
> >> implementation.  Your visitor's methods will then be called as each item is
> >> parsed, kind of like "SAX" XML parsing, but type-safe.
>
> >> *Nonblocking I/O*
>
> >> The "Reader" type declared above is based on blocking I/O, but many users
> >> would prefer a non-blocking approach.  I'm less sure how to handle this, 
> >> but
> >> my thought was that we could provide a utility class like:
>
> >>   class NonblockingHelper {
> >>    public:
> >>     template <typename MessageType>
> >>     NonblockingHelper(typename MessageType::Visitor* visitor);
>
> >>     // Push data into the buffer.  If the data completes any fields,
> >>     // they will be passed to the underlying visitor.  Any left-over data
> >>     // is remembered for the next call.
> >>     void PushData(void* data, int size);
> >>   };
>
> >> With this, you can use whatever non-blocking I/O mechanism you want, and
> >> just have to push the data into the NonblockingHelper, which will take care
> >> of calling the Visitor as necessary.
>
> >> *C++ implementation*
>
> >> I've written up a patch implementing this for C++ (not yet including the
> >> nonblocking part):
>
> >>  http://codereview.appspot.com/4077052
>
> >> *Feedback*
>
> >> What do you think?
>
> >> I know I'm excited to use this in some of my own side projects (which is
> >> why I spent my weekend working on it), but before adding this to the
> >> official implementation we should make sure it is broadly useful.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

[protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

Reply via email to