On 05/09/2012 04:41 AM, Evan Jones wrote:
On May 8, 2012, at 21:26 , Jeremy Stribling wrote:
Thanks for the response. As you say, this solution is painful because you
can't enable the optimization until the old version of the program is
completely deprecated. This is somewhat simple in the case that you yourself
are deploying the software, but when you're shipping software to customers (as
we are) and have to support many old versions, it will take a very long time
(possibly years) before you can enable the optimization. Also, it breaks the
downgrade path. Once you enable the optimization, you can never downgrade back
to a version that did not know about the new field.
I think I now understand your problem. You want to add some additional stuff to your .proto file to
indicate the incompatible change, then have the application code not need to know about it? Eg. you
want to write the application code that only accesses "new_my_data" and never needs to
check for "deprecated_my_data", but in fact the underlying protocol buffer supports both
fields, or something like that.
Hey Evan, thanks for the response. That is one way to look at it.
Ideally, the application code would only access my_data(), and it would
magically appear as the new type in the new version of the app and the
old type in the old version of the app. But renaming the field for the
new version is fine too. The important points are twofold: 1) the data
would only appear once on the wire and in storage, and translated if
necessary by the receiver to the expected format, and 2) that this
translation could work on the downgrade path as well, so that old
applications could be able to interpret data written by new
applications, even if the format of the fields have changes. Sameer
Ajmani's ECOOP paper and thesis work discusses these types of scenarios
(http://pmg.csail.mit.edu/~ajmani/papers/ecoop06-upgrades.pdf).
It seems to me like this is starts to end up in the territory of "too high level for the
protocol buffer library itself" since I can't imagine this working without handshaking like
Oliver talked about (e.g. "I understand everything up to version X"). My personal
experience has been more like what Daniel describes: you keep both versions of the field, and your
code has if statements to check for both. I believe this can be made to work, even in your
scenario, but it does require ugly code in your application to handle it. My impression is that you
are trying to avoid that.
I'm trying to avoid keeping both version of the data in the wire format,
since in this scenario the whole reason for the change was
optimization. I don't care if the new version of the protobuf has two
separate fields; there just needs to be a way for the old version to
still get at its old data. Involving the application in some way is
totally reasonable and expected; I am just hoping to find a way to add a
translator into the deserialization code, so that it can be upgraded
independently on old instances of the program, to be able to interpret
the new version of the protobof while still running the old version of
the application code. Here's a specific example:
* There are two nodes, 1 and 2, running version A of the software.
* They exchange messages containing protobuf P, which contains a string
field F.
* We write a new version B of the software, which changes field F to an
integer as an optimization.
* We upgrade node 1, but node 2.
* If node 1 sends a protobuf P to node 2, I want node 2 to be able to
access field F as a string, even though the wire format sent by node 1
was an integer.
Random brainstorming that may not be helpful in any way:
I'm curious about how you end up choosing to solve this, but I think you are going to need to use some
combination of custom field options (to specify the change in a way that protoc can parse?), and then hacks
in the C++ code generator to call your custom upgrade / downgrade code. I think this can work somewhat
seamlessly in the "reading older messages" case (eg. you just add code that says "if we see
the old field, upgrade it to the new field"). However, this can't work in the "writing a newer
message for an older receiver" case without making the Serialize* code aware of the version it should be
*writing*. I think this is going to be pretty application specific?
I think doing it on the deserialize is better, because then we can put
the burden of translation on the receiver, and the sender can merrily
send the same serialized message to multiple receivers (tagged with its
own version) without having to keep track of the version capabilities of
each receiver. This is especially important, as Oliver pointed out,
when the data is not transferred over a live connection but through the
persistent state. It will definitely be app-specific, which was why I
was thinking an insertion point might be the way to go.
My other thought: I think you might be able to get away with writing a protoc
plugin that adds two functions to the class scope (which already exists as an
insertion point):
static UpgradedMessage ParseAnyMessageVersion(…);
string SerializeToVersion(int target_version);
These functions can apply the appropriate upgrade/downgrading as needed.
However, you then need to call the appropriate functions to read/write the
messages. However, I would argue that since in the serializing case you are
going to need to know the target_version anyway, this might actually work?
That's a good thought, but calling custom methods to do the
(de)serialization is a bit hard since the data could be coming in from a
string, an input stream, a zero copy stream, a file descriptor, an
array, etc (e.g., all of the possible ParseFrom* methods). That's why I
was trying to figure out a way to insert custom code deeper in the
stack, such as MergePartialFromCodedStream.
Good luck, and again I'd be interested to know how you do end up solving this.
I'm interested in the same thing. Thanks much for the brainstorming.
Jeremy
--
You received this message because you are subscribed to the Google Groups "Protocol
Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/protobuf?hl=en.