Here are some of our thought on the discussions of the past few days with respect to backwards compatibility.
In general at Twitter we're not necessarily against backwards incompatible changes per se. *It depends on the "Return on Pain". While it is hard to quantify the returns in the abstract, I can try to sketch out which kinds of changes are the most painful and therefore cause the most friction for us.In rough order of increasing pain to deal with:a) There is a new upstream (3.x) release, but it is so backwards incompatible, that we won't be able to adopt it for the foreseeable future. Even though we don’t adopt it, it still causes pain. Now development becomes that much harder because we'd have to get a patch for trunk, a patch for 3.x and a patch for the 2.x branch. Conversely if patches go into 2.x only, now the releases start drifting apart. We already have (several dozen) patches in production that have not yet made it upstream, but are striving to keep this list as short as possible to reduce the rebase pain and risk.b) Central Daemons (RM, or pairs of HA NNs) have to be restarted causing a cluster-wide outage. The work towards work-preserving restart in progress in various areas makes these kinds of upgrades less painful.c) Server-side requires different runtime from client-side. We'd have to produce multiple artifacts, but we could make that work. For example, NN code uses Java 8 features, but clients can still use Java 7 to submit jobs and read/write HDFS.Now for the more painful backwards incompatibilities:d) All clients have to recompile (a token uses protobuf instead of thrift, an interface becomes an abstract class or vice versa). Not only do these kinds of changes make a rolling upgrade impossible, more importantly it requires all our clients to recompile their code and redeploy their production pipelines in a coordinated fashion. On top of this, we have multiple large production clusters and clients would have to keep multiple incompatible pipelines running, because we simply cannot upgrade all clusters in all datacenters at the same time.e) Customers are forced to restart and can no longer run with JDK 7 clients because job submission client code or HDFS has started using JDK 8-only features. Eventually group will reduce, but for at least another year if not more this will be very painful.f) Even more painful is when Yarn/MapReduce APIs change so that customers not only have to recompile, but also have to change hundreds of scripts / flows in order to deal with the API change. This problem is compounded by other tools in the Hadoop ecosystem that would have to deal with these changes. There would be two different versions of Cascading, HBase, Hive, Pig, Spark, Tez, you name it.g) Without proper classpath isolation, third party dependency changes (guava, protobuf version, etc) are probably as painful as API changes.h) HDFS client API get changed in a backwards incompatible way requiring all clients to change their code, recompile and re-start their services in a coordinated way. We have tens of thousands of production servers reading from / writing to Hadoop and cannot have all of these long running clients restart at the same time. To put these in perspective, despite us being one of the early adopters of Hadoop 2 in production at the scale of many thousands of nodes, we are still wrapping up the migration from our last Hadoop 1 clusters. We have many war stories about many of the above incompatibilities. As I've tweeted about publicly the gains have been significant with this migration to Hadoop 2, but the friction has also been considerable.To get specific about JDK 8, we are intending to move to Java 8. Right now we're letting clients choose to run tasks with JDK 8 optionally, then we'll make it default.We'll switch to the daemons running with JDK 8. What we're concerned it would then be feasible to use JDK 8 features on the servers side (see c) above).I'm suggesting that if we do allow backwards incompatible changes, we introduce an upgrade path through an agreed upon stepping stone release.For example, a protocol changing from thrift to protobuf can be done in steps. In the stepping-stone release both would be accepted. in the following release (or two releases later) the thrift version support is dropped.This would allow for a rolling upgrade, or even if a cluster-wide restart is needed, at least customers can adopt to the change at a pace of weeks or months. Once no more (important) customers are running the thrift client, we could then roll to the next release.It would be useful to coordinate the backwards incompatibilities so that not every release becomes a stepping-stone release.* *Cheers,Joep* On Mon, Mar 9, 2015 at 6:04 PM, Andrew Wang <andrew.w...@cloudera.com> wrote: > Hi Mayank, > > > > 1. We would be moving to Hadoop -3 (Not this year though) however I don't > > see we can do another JDK upgrade so soon. So the point I am trying to > make > > is we should be supporting jdk 7 as well for Hadoop-3. > > > > We'll still be releasing 2.x releases for a while, with similar > featuresets as 3.x. You can keep using 2.x until you feel ready to jump to > JDK8. > > > > 2. For the sake of JDK 8 and classpath isolation we shouldn't be making > > another release as those can be supported in Hadoop 2 as well, so what is > > the motivation of making Hadoop 3 so soon? > > > > So already you can run 2.x with JDK8 and some degree of classpath > isolation, but I've discussed the motivation for a 3.0 on the previous > thread. We had issues in the JDK6 days with our dependencies not supporting > JDK6 and thus not releasing security or bug fixes, which in turn put us in > a bad spot. Classpath isolation we are still discussing, but right now it's > opt-in and somewhat incomplete, which makes it hard for downstream projects > to effectively make use of it. The goal for 3.0 is to clean this up and > have it on by default (or always). > > Best, > Andrew >