Your proposal has one VisitXXX function for each repeated type. How does it handle a message with two repeated fields of the same type?
On Feb 2, 2:30 pm, Kenton Varda <ken...@google.com> wrote: > On Tue, Feb 1, 2011 at 3:17 PM, Jason Hsueh <jas...@google.com> wrote: > > Conceptually this sounds great, the big question to me is whether this > > should be implemented as an option in the compiler or as a separate plugin. > > I haven't taken a thorough look at the patch, but I'd guess it adds a decent > > amount to the core code generator. I have a preference for the plugin > > approach, but of course I'm primarily an internal protobuf user, so I'm > > willing to be convinced otherwise :-) Would using a plugin, possibly even > > shipped with the standard implementation, make this feature too inconvenient > > to use? Or is there enough demand for this that it warrants implementing as > > an option? > > First of all, note that this feature is off by default. You have to turn it > on with the generate_visitors message-level option. The only new code added > to the base library is a couple templates in WireFormatLite, which are of > course never instantiated if you don't generate visitor code. > > There are a few reasons I prefer to make this part of the base code > generator: > > - If you look at the patch, you'll see that the code generation for the two > Guide classes actually shares a lot with the code generation for > MergeFromCodedStream and SerializeWithCachedSizes. To make this a plugin, > either we'd have to expose parts of the C++ code generator internals > publicly (eww) or we'd have to reproduce a lot of code (also eww). > > - The Reader and Writer classes directly use WireFormatLite, which is a > private interface. > > - It seems clear that this feature is widely desired by open source users. > We're not talking about a niche use case here. > > > Regarding the proposed interfaces: I can imagine some applications where > > the const refs passed to the visitor methods may be too restrictive - the > > user may instead want to take ownership of the object. e.g., suppose the > > stream is a series of requests, and each of the visitor handlers needs to > > start some asynchronous work. It would be good to hear if users have use > > cases that don't quite fit into this model (or at least if the existing use > > cases will work). > > Interesting point. In the Reader case, it's creating new objects, so in > theory it ought to be able to hand off ownership to the Visitor it calls. > But, the Walker is walking an existing object and thus clearly cannot give > up ownership. It seems clear that some use cases need const references, > which means that the only way we could support ownership passing is by > adding another parallel set of methods. I suppose they could have default > implementations that delegate to the const reference versions, in which case > only people who wanted to optimize for them would need to override them. > But I'd like to see that this is really desired first -- it's easy enough > to add later. > > Also note that my code currently doesn't reuse message objects, but > improving it to do so would be straightforward. A Reader could allocate one > object of each sub-message type for reuse. But, it seems like that wouldn't > play well with ownership-passing. > > > > > > > > > > > On Tue, Feb 1, 2011 at 10:45 AM, Kenton Varda <ken...@google.com> wrote: > > >> Hello open source protobuf users, > > >> *Background* > > >> Probably the biggest deficiency in the open source protocol buffers > >> libraries today is a lack of built-in support for handling streams of > >> messages. True, it's not too hard for users to support it manually, by > >> prefixing each message with its size as described here: > > >>http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre... > > >> However, this is awkward, and typically requires users to reach into the > >> low-level CodedInputStream/CodedOutputStream classes and do a lot of work > >> manually. Furthermore, many users want to handle streams > >> of heterogeneous message types. We tell them to wrap their messages in an > >> outer type using the "union" pattern: > > >> http://code.google.com/apis/protocolbuffers/docs/techniques.html#union > > >> But this is kind of ugly and has unnecessary overhead. > > >> These problems never really came up in our internal usage, because inside > >> Google we have an RPC system and other utility code which builds on top of > >> protocol buffers and provides appropriate abstraction. While we'd like to > >> open source this code, a lot of it is large, somewhat messy, and highly > >> interdependent with unrelated parts of our environment, and no one has had > >> the time to rewrite it all cleanly (as we did with protocol buffers > >> itself). > > >> *Proposed solution: Generated Visitors* > > >> I've been wanting to fix this for some time now, but didn't really have a > >> good idea how. CodedInputStream is annoyingly low-level, but I couldn't > >> think of much better an interface for reading a stream of messages off the > >> wire. > > >> A couple weeks ago, though, I realized that I had been failing to consider > >> how new kinds of code generation could help this problem. I was trying to > >> think of solutions that would go into the protobuf base library, not > >> solutions that were generated by the protocol compiler. > > >> So then it became pretty clear: A protobuf message definition can also be > >> interpreted as a definition for a streaming protocol. Each field in the > >> message is a kind of item in the stream. > > >> // A stream of Foo and Bar messages, and also strings. > >> message MyStream { > >> option generate_visitors = true; // enables generation of streaming > >> classes > >> repeated Foo foo = 1; > >> repeated Bar bar = 2; > >> repeated string baz = 3; > >> } > > >> All we need to do is generate code appropriate for treating MyStream as a > >> stream, rather than one big message. > > >> My approach is to generate two interfaces, each with two provided > >> implementations. The interfaces are "Visitor" and "Guide". > >> MyStream::Visitor looks like this: > > >> class MyStream::Visitor { > >> public: > >> virtual ~Visitor(); > > >> virtual void VisitFoo(const Foo& foo); > >> virtual void VisitBar(const Bar& bar); > >> virtual void VisitBaz(const std::string& baz); > >> }; > > >> The Visitor class has two standard implementations: "Writer" and > >> "Filler". MyStream::Writer writes the visited fields to a > >> CodedOutputStream, using the same wire format as would be used to encode > >> MyStream as one big message. MyStream::Filler fills in a MyStream message > >> object with the visited values. > > >> Meanwhile, Guides are objects that drive Visitors. > > >> class MyStream::Guide { > >> public: > >> virtual ~Guide(); > > >> // Call the methods of the visitor on the Guide's data. > >> virtual void Accept(MyStream::Visitor* visitor) = 0; > > >> // Just fill in a message object directly rather than use a visitor. > >> virtual void Fill(MyStream* message) = 0; > >> }; > > >> The two standard implementations of Guide are "Reader" and "Walker". > >> MyStream::Reader reads items from a CodedInputStream and passes them to > >> the > >> visitor. MyStream::Walker walks over a MyStream message object and passes > >> all the fields to the visitor. > > >> To handle a stream of messages, simply attach a Reader to your own Visitor > >> implementation. Your visitor's methods will then be called as each item is > >> parsed, kind of like "SAX" XML parsing, but type-safe. > > >> *Nonblocking I/O* > > >> The "Reader" type declared above is based on blocking I/O, but many users > >> would prefer a non-blocking approach. I'm less sure how to handle this, > >> but > >> my thought was that we could provide a utility class like: > > >> class NonblockingHelper { > >> public: > >> template <typename MessageType> > >> NonblockingHelper(typename MessageType::Visitor* visitor); > > >> // Push data into the buffer. If the data completes any fields, > >> // they will be passed to the underlying visitor. Any left-over data > >> // is remembered for the next call. > >> void PushData(void* data, int size); > >> }; > > >> With this, you can use whatever non-blocking I/O mechanism you want, and > >> just have to push the data into the NonblockingHelper, which will take care > >> of calling the Visitor as necessary. > > >> *C++ implementation* > > >> I've written up a patch implementing this for C++ (not yet including the > >> nonblocking part): > > >> http://codereview.appspot.com/4077052 > > >> *Feedback* > > >> What do you think? > > >> I know I'm excited to use this in some of my own side projects (which is > >> why I spent my weekend working on it), but before adding this to the > >> official implementation we should make sure it is broadly useful. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.