Thanks very much Jon (see below). You make good points and I like the approach the you describe. I am still thinking, however, that there is power in the ability for message instances to write and parse themselves from a stream.
A message instance could be passed a stream object which chains back to the network connection from which bytes are being received. A stop flag based parsing mechanism could be passed this buffer object, and would handle reading the stream and initializing its properties, exiting when the serialization of that message instance stopped. At this point, a new message instance could be created, and the process repeated. The type of message doing the parsing could vary from message to message, even with the serializations being sent and received back to back. This mechanism would work regardless of field-types being streamed. A message type consisting solely of varint fields, whose length is determined while reading the varint's value, would support streaming no differently than any other message type. The solution also seems to support every requirement supported by the original buffer type. Messages serialized to a buffer, could just as easily be initialized from that buffer as they could from the string contained by the buffer. m1 = Message() buffer = Buffer() [...] (initialize instance vars) m1.SerializeToBuffer(buffer) m2 = Message() m2.ParseFromBuffer(buffer) Produces same result as: m2 = Message() bytes = m1.SerializeToString() m2.ParseFromString(bytes) The string-based parse would ignore the stop bit, parsing the entire string. The buffer-based parsing would stop parsing when the stop bit, producing the same result. Handling of concatenated serializations is supported through repeated calls to parse from buffer: m1 = Message() [...] (initialize instance vars) m2 = Message() [...] (initialize instance vars) buffer = Buffer() m1.SerializeToBuffer(buffer) m2.SerializeToBuffer(buffer) m3 = Message() m3.ParseFromBuffer(buffer) m3.ParseFromBuffer(buffer) Would produce same result as: m3 = Message() m3.ParseFromString(m1.SerializeToString() + m2.SerializeToString()) As long as an unused, and never to be used, field number is used to generate the stop bit's key, then I don't believe there are any incompatibilities between buffer-based message marshalling and the existing string-based code. A very easy usage: # Sending side for message in messages: message.SerializeToBuffer(buffer) # Receiving side for msgtype in types: message = msgtype() message.ParseFromBuffer(buffer) Unless I've overlooked something, it seems like the stream based marshalling and unmarshalling is powerful, simple, and completely compatible with all existing code. But there is a very real chance I've overlooked something... - Shane -------- Forwarded Message -------- > From: Jon Skeet <[EMAIL PROTECTED]> > To: Shane Green <[EMAIL PROTECTED]> > Subject: Re: Streaming > Date: Fri, 5 Dec 2008 08:19:41 +0000 > > 2008/12/5 Shane Green <[EMAIL PROTECTED]> > Thanks Jon. Those are good points. I rather liked the > self-delimiting > nature of fields, and thought this method would bring that > feature up to > the message level, without breaking any of the existing > capabilities. > So my goal was a message which could truly be streamed; > perhaps even > sent without knowing its own size up front. Perhaps I > overlooked > something? > > Currently the PB format requires that you know the size of each > submessage before you send it. You don't need to know the size of the > whole message, as it's assumed to be the entire size of the > datastream. It's unfortunate that you do need to provide the whole > message to the output stream though, unless you want to manually > serialize the individual fields. > > My goal was slightly different - I wanted to be able to stream a > sequence of messages. The most obvious use case (in my view) is a log. > Write out a massive log file as a sequence of entries, and you can > read it back in one at a time. It's not designed to help to stream a > single huge message though. > > Would you mind if I resent my questions to the group? I lack > confidence and wanted to make sure I wasn't overlooking > something > ridiculous, but am thinking that the exchange would be > informative. > > Absolutely. Feel free to quote anything I've written if you think it > helps. > > Also, how are you serializing and parsing messages as if they > are > repeated fields of a container message? Is there a fair bit > of parsing > or work being done outside the standard protocol-buffer APIs? > > There's not a lot of work, to be honest. On the parsing side the main > difficulty is getting a type-safe delegate to read a message from the > stream. The writing side is trivial. Have a look at the code: > > http://github.com/jskeet/dotnet-protobufs/tree/master/src/ProtocolBuffers/MessageStreamIterator.cs > http://github.com/jskeet/dotnet-protobufs/tree/master/src/ProtocolBuffers/MessageStreamWriter.cs > > There may have been some trivial changes to CodedInputStream - I can't > remember offhand. > > Hope this helps, > Jon > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~----------~----~----~----~------~----~------~--~---