On 14 May 2015, at 15:23, Alan Burlison 
<alan.burli...@oracle.com<mailto:alan.burli...@oracle.com>> wrote:

I think bundling or forking is the only practical option. I was looking to see 
if we could provide ProtocolBuffers as an installable option on our platform, 
if it's a version-compatibility nightmare as you say, that's going to be 
difficult as we really don't want to have to provide multiple versions.

The problem Hadoop has is that it's code, especially the HDFS client code, is 
used in a lot of other applications, and they end up having be in sync at the 
Java level. Hopefully the protobuf wire format is compatible (that is the whole 
point of the format, after all), but we know from experience that the JAR-level 
it isn't. Having to rebuild every single .proto derived java class and then 
switch across the entire dependency tree was the upgrade path there, with about 
a month where getting the trunk versions of two apps to link was pretty hit and 
miss.

I think everyone came out burned from that
-scared and unwilling to repeat the experience
-not believing any further google assertions of library compatibility (see 
also: guava)

What to do?

  1.  Leave alone and it slowly ages, when an upgrade happens it can be more 
traumatic. But until that time: nothing breaks.
  2.  Upgrade regularly and you can dramatically break things, so people don't 
upgrade Hadoop itself, they stick with old versions (with issues already fixed 
in the later releases), they keep on requesting backported fixes into the 
"working" branch and you end up with two branches of your code to maintain.
  3.  Fork and you take on maintenance costs of your forked library forever; it 
will implicitly age and theres' the opportunity cost of that work, i.e. better 
things to waste your time on.
  4.  Rip out protobuf entirely and switch to something else (thrift) that has 
better stability, tag the proto channels as deprecated, etc, etc. You'd better 
trust the successor's stability and security features before going to that 
effort.

Hadoop 2.x has defaulted to option (1).

Now: why do you want to use a later version of protobuf.jar? Is it because "it 
is there"? Or is there a tangible need?

-steve

Reply via email to