On 05/09/2012 04:41 AM, Evan Jones wrote:
On May 8, 2012, at 21:26 , Jeremy Stribling wrote:
Thanks for the response.  As you say, this solution is painful because you 
can't enable the optimization until the old version of the program is 
completely deprecated.  This is somewhat simple in the case that you yourself 
are deploying the software, but when you're shipping software to customers (as 
we are) and have to support many old versions, it will take a very long time 
(possibly years) before you can enable the optimization.  Also, it breaks the 
downgrade path.  Once you enable the optimization, you can never downgrade back 
to a version that did not know about the new field.
I think I now understand your problem. You want to add some additional stuff to your .proto file to 
indicate the incompatible change, then have the application code not need to know about it? Eg. you 
want to write the application code that only accesses "new_my_data" and never needs to 
check for "deprecated_my_data", but in fact the underlying protocol buffer supports both 
fields, or something like that.

Hey Evan, thanks for the response. That is one way to look at it. Ideally, the application code would only access my_data(), and it would magically appear as the new type in the new version of the app and the old type in the old version of the app. But renaming the field for the new version is fine too. The important points are twofold: 1) the data would only appear once on the wire and in storage, and translated if necessary by the receiver to the expected format, and 2) that this translation could work on the downgrade path as well, so that old applications could be able to interpret data written by new applications, even if the format of the fields have changes. Sameer Ajmani's ECOOP paper and thesis work discusses these types of scenarios (http://pmg.csail.mit.edu/~ajmani/papers/ecoop06-upgrades.pdf).

It seems to me like this is starts to end up in the territory of "too high level for the 
protocol buffer library itself" since I can't imagine this working without handshaking like 
Oliver talked about (e.g. "I understand everything up to version X"). My personal 
experience has been more like what Daniel describes: you keep both versions of the field, and your 
code has if statements to check for both. I believe this can be made to work, even in your 
scenario, but it does require ugly code in your application to handle it. My impression is that you 
are trying to avoid that.

I'm trying to avoid keeping both version of the data in the wire format, since in this scenario the whole reason for the change was optimization. I don't care if the new version of the protobuf has two separate fields; there just needs to be a way for the old version to still get at its old data. Involving the application in some way is totally reasonable and expected; I am just hoping to find a way to add a translator into the deserialization code, so that it can be upgraded independently on old instances of the program, to be able to interpret the new version of the protobof while still running the old version of the application code. Here's a specific example:

* There are two nodes, 1 and 2, running version A of the software.
* They exchange messages containing protobuf P, which contains a string field F. * We write a new version B of the software, which changes field F to an integer as an optimization.
* We upgrade node 1, but node 2.
* If node 1 sends a protobuf P to node 2, I want node 2 to be able to access field F as a string, even though the wire format sent by node 1 was an integer.



Random brainstorming that may not be helpful in any way:

I'm curious about how you end up choosing to solve this, but I think you are going to need to use some 
combination of custom field options (to specify the change in a way that protoc can parse?), and then hacks 
in the C++ code generator  to call your custom upgrade / downgrade code. I think this can work somewhat 
seamlessly in the "reading older messages" case (eg. you just add code that says "if we see 
the old field, upgrade it to the new field"). However, this can't work in the "writing a newer 
message for an older receiver" case without making the Serialize* code aware of the version it should be 
*writing*. I think this is going to be pretty application specific?

I think doing it on the deserialize is better, because then we can put the burden of translation on the receiver, and the sender can merrily send the same serialized message to multiple receivers (tagged with its own version) without having to keep track of the version capabilities of each receiver. This is especially important, as Oliver pointed out, when the data is not transferred over a live connection but through the persistent state. It will definitely be app-specific, which was why I was thinking an insertion point might be the way to go.

My other thought: I think you might be able to get away with writing a protoc 
plugin that adds two functions to the class scope (which already exists as an 
insertion point):

static UpgradedMessage ParseAnyMessageVersion(…);
string SerializeToVersion(int target_version);

These functions can apply the appropriate upgrade/downgrading as needed. 
However, you then need to call the appropriate functions to read/write the 
messages. However, I would argue that since in the serializing case you are 
going to need to know the target_version anyway, this might actually work?


That's a good thought, but calling custom methods to do the (de)serialization is a bit hard since the data could be coming in from a string, an input stream, a zero copy stream, a file descriptor, an array, etc (e.g., all of the possible ParseFrom* methods). That's why I was trying to figure out a way to insert custom code deeper in the stack, such as MergePartialFromCodedStream.

Good luck, and again I'd be interested to know how you do end up solving this.

I'm interested in the same thing.  Thanks much for the brainstorming.

Jeremy

--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Reply via email to