Re: RPC versioning

Sanjay Radia Thu, 09 Oct 2008 15:28:39 -0700


On Oct 3, 2008, at 10:06 PM, Raghu Angadi wrote:

If version handling is required, I think Doug's approach will workwell
for current RPC.
Extra complexity of handling different versions in objectserializationmight be easily over estimated (for a duration of 1 year, say). Iwouldthink easily more than 90% of objects' serialization has not changedin
last 1 to 2 years.

As long as the innocent is protected (i.e. no existing write() method
needs to change unless the fields change), it will be fine.

Many times effective serialization changes mainly because of new sub
classes and not the actual serialization method themselves.

Do we handle change of arguments to a method similarly? How are
subclasses handled?

Change to method arguments? -

One possible solution: create a new method instead -- this will begood enough if it is for a short time.


Subclassing? - don't; instead  add or delete fields.

Raghu.

Doug Cutting wrote:
> It has been proposed in the discussions defining Hadoop 1.0 that we
> extend our back-compatibility policy.
>
> http://wiki.apache.org/hadoop/Release1.0Requirements
>
> Currently we only attempt to promise that application code will run
> without change against compatible versions of Hadoop.  If one has
> clusters running different yet compatible versions, then one mustuse a> different classpath for each cluster to pick up the appropriateversion
> of Hadoop's client libraries.
>
> The proposal is that we extend this, so that a client library fromone> version of Hadoop will operate correctly with other compatibleHadoop
> versions, i.e., one need not alter one's classpath to contain the
> identical version, only a compatible version.
>
> Question 1: Do we need to solve this problem soon, for release 1.0,
> i.e., in order to provide a release whose compatibility lifetimeis ~1> year, instead of the ~4months of 0. releases? This is not clearto me.> Can someone provide cases where using the same classpath whentalking
> to multiple clusters is critical?
>
> Assuming it is, to implement this requires RPC-level support for
> versioning.  We could add this by switching to an RPC mechanism with
> built-in, automatic versioning support, like Thrift, Etch orProtocol
> Buffers.  But none of these is a drop-in replacement for Hadoop RPC.
> They will probably not initially meet our performance andscalability
> requirements.  Their adoption will also require considerable and
> destabilizing changes to Hadoop. Finally, it is not today clearwhich
> of these would be the best candidate.  If we move too soon, we might
> regret our choice and wish to move again later.
>
> So, if we answer yes to (1) above, wishing to provide RPC
> back-compatibility in 1.0, but do not want to hold up a 1.0release, is
> there an alternative to switching?  Can we provide incremental
> versioning support to Hadoop's existing RPC mechanism that willsuffice
> until a clear replacement is available?
>
> Below I suggest a simple versioning style that Hadoop might use to
> permit its RPC protocols to evolve compatibly until an RPC systemwith> built-in versioning support is selected. This is not intended tobe a> long-term solution, but rather something that would permit us tomore
> flexibly evolve Hadoop's protocols over the next year or so.
>
> This style assumes a globally increasing Hadoop version number.  For
> example, this might be the subversion repository version of trunkwhen
> a change is first introduced.
>
> When an RPC client and server handshake, they exchange version
> numbers.  The lower of their two version numbers is selected as the
> version for the connection.
>
> Let's walk through an example.  We start with a class that contains
> no versioning information and a single field, 'a':
>
>   public class Foo implements Writable {
>     int a;
>
>     public void write(DataOutput out) throws IOException {
>       out.writeInt(a);
>     }
>
>     public void readFields(DataInput in) throws IOException {
>       a = in.readInt();
>     }
>   }
>
> Now, in version 1, we add a second field, 'b' to this:
>
>   public class Foo implements Writable {
>     int a;
>     float b;                                        // new field
>
>     public void write(DataOutput out) throws IOException {
>       int version = RPC.getVersion(out);
>       out.writeInt(a);
> if (version >= 1) { // peersupports b
>     out.writeFloat(b);                          // send it
>       }
>     }
>
>     public void readFields(DataInput in) throws IOException {
>       int version = RPC.getVersion(in);
>       a = in.readInt();
>       if (version >= 1) {                           // if supports b
>     b = in.readFloat();                         // read it
>       }
>     }
>   }
>
> Next, in version 2, we remove the first field, 'a':
>
>   public class Foo implements Writable {
>     float b;
>
>     public void write(DataOutput out) throws IOException {
>       int version = RPC.getVersion(out);
>       if (version < 2) {                            // peer wants a
>     out.writeInt(0);                            // send it
>       }
>       if (version >= 1) {
>     out.writeFloat(b);
>       }
>     }
>
>     public void readFields(DataInput in) throws IOException {
>       int version = RPC.getVersion(in);
>       if (version < 2) {                            // peer writes a
>     in.readInt();                               // ignore it
>       }
>       if (version >= 1) {
>     b = in.readFloat();
>       }
>     }
>   }
>
> Could something like this work?  It would require just some minor
> changes to Hadoop's RPC mechanism, to support the version handshake.
> Beyond that, it could be implemented incrementally as RPC protocols
> evolve. It would require some vigilance, to make sure thatversioning> logic is added when classes change, but adding automated testsagainst
> prior versions would identify lapses  here.
>
> This may appear to add a lot of version-related logic, but with
> automatic versioning, in many cases, some version-related logic isstill
> required.  In simple cases, one adds a completely new field with a
> default value and is done, with automatic versioning handling muchof> the work. But in many other cases an existing field is changedand the
> application must translate old values to new, and vice versa.  These
> cases still require application logic, even with automaticversioning.> So automatic versioning is certainly less intrusive, but not asmuch as
> one might first assume.
>
> The fundamental question is how soon we need to address inter-version> RPC compatibility. If we wish to do it soon, I think we'd be wiseto> consider a solution that's less invasive and that does not forceus into
> a potentially premature decision.
>
> Doug

Re: RPC versioning

Reply via email to